commit 717f769197ece6814ebfcb7e3e7066bbdec2106b
Author: Nate Williams <nate@FreeBSD.org>
Date:   Tue Jun 29 06:04:45 1993 +0000

    Gnu e?grep 1.6

Notes:
    svn path=/vendor/misc-GNU/dist1/; revision=61
    svn path=/vendor/misc-GNU/GREP_1_6/; revision=63; tag=vendor/misc-GNU/grep/1.6

diff --git a/gnu/usr.bin/grep/COPYING b/gnu/usr.bin/grep/COPYING
new file mode 100644
index 000000000000..a43ea2126fb6
--- /dev/null
+++ b/gnu/usr.bin/grep/COPYING
@@ -0,0 +1,339 @@
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+                          675 Mass Ave, Cambridge, MA 02139, USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	Appendix: How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) 19yy  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) 19yy name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/gnu/usr.bin/grep/Makefile b/gnu/usr.bin/grep/Makefile
new file mode 100644
index 000000000000..092e523ebe68
--- /dev/null
+++ b/gnu/usr.bin/grep/Makefile
@@ -0,0 +1,7 @@
+
+PROG=	grep
+SRCS=	dfa.c regex.o grep.o
+CFLAGS+=-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1
+MLINKS= grep.1 egrep.1
+
+.include <bsd.prog.mk>
diff --git a/gnu/usr.bin/grep/README b/gnu/usr.bin/grep/README
new file mode 100644
index 000000000000..27f5bae42d8d
--- /dev/null
+++ b/gnu/usr.bin/grep/README
@@ -0,0 +1,70 @@
+This README documents GNU e?grep version 1.6.  All bugs reported for
+previous versions have been fixed.
+
+See the file INSTALL for compilation and installation instructions.
+
+Send bug reports to bug-gnu-utils@prep.ai.mit.edu.
+
+GNU e?grep is provided "as is" with no warranty.  The exact terms
+under which you may use and (re)distribute this program are detailed
+in the GNU General Public License, in the file COPYING.
+
+GNU e?grep is based on a fast lazy-state deterministic matcher (about
+twice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
+search for a fixed string that eliminates impossible text from being
+considered by the full regexp matcher without necessarily having to
+look at every character.  The result is typically many times faster
+than Unix grep or egrep.  (Regular expressions containing backreferencing
+may run more slowly, however.)
+
+GNU e?grep is brought to you by the efforts of several people:
+
+	Mike Haertel wrote the deterministic regexp code and the bulk
+	of the program.
+
+	James A. Woods is responsible for the hybridized search strategy
+	of using Boyer-Moore-Gosper fixed-string search as a filter
+	before calling the general regexp matcher.
+
+	Arthur David Olson contributed code that finds fixed strings for
+	the aforementioned BMG search for a large class of regexps.
+
+	Richard Stallman wrote the backtracking regexp matcher that is
+	used for \<digit> backreferences, as well as the getopt that
+	is provided for 4.2BSD sites.  The backtracking matcher was
+	originally written for GNU Emacs.
+
+	D. A. Gwyn wrote the C alloca emulation that is provided so
+	System V machines can run this program.  (Alloca is used only
+	by RMS' backtracking matcher, and then only rarely, so there
+	is no loss if your machine doesn't have a "real" alloca.)
+
+	Scott Anderson and Henry Spencer designed the regression tests
+	used in the "regress" script.
+
+	Paul Placeway wrote the manual page, based on this README.
+
+If you are interested in improving this program, you may wish to try
+any of the following:
+
+1.  Replace the fast search loop with a faster search loop.
+    There are several things that could be improved, the most notable
+    of which would be to calculate a minimal delta2 to use.
+
+2.  Make backreferencing \<digit> faster.  Right now, backreferencing is
+    handled by calling the Emacs backtracking matcher to verify the partial
+    match.  This is slow; if the DFA routines could handle backreferencing
+    themselves a speedup on the order of three to four times might occur
+    in those cases where the backtracking matcher is called to verify nearly
+    every line.  Also, some portability problems due to the inclusion of the
+    emacs matcher would be solved because it could then be eliminated.
+    Note that expressions with backreferencing are not true regular
+    expressions, and thus are not equivalent to any DFA.  So this is hard.
+
+3.  Handle POSIX style regexps.  I'm not sure if this could be called an
+    improvement; some of the things on regexps in the POSIX draft I have
+    seen are pretty sickening.  But it would be useful in the interests of
+    conforming to the standard.
+
+4.  Replace the main driver program grep.c with the much cleaner main driver
+    program used in GNU fgrep.
diff --git a/gnu/usr.bin/grep/dfa.c b/gnu/usr.bin/grep/dfa.c
new file mode 100644
index 000000000000..08b383d04bd7
--- /dev/null
+++ b/gnu/usr.bin/grep/dfa.c
@@ -0,0 +1,2276 @@
+/* dfa.c - determinisitic extended regexp routines for GNU
+   Copyright (C) 1988 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
+
+/* Written June, 1988 by Mike Haertel
+   Modified July, 1988 by Arthur David Olson to assist BMG speedups  */
+
+#include <stdio.h>
+#include <assert.h>
+
+#if defined(USG) || defined(STDC_HEADERS)
+#include <string.h>
+#ifndef index
+#define index strchr
+#endif
+#else
+#include <strings.h>
+#endif
+
+#include "dfa.h"
+
+#if __STDC__
+typedef void *ptr_t;
+#else
+typedef char *ptr_t;
+#endif
+
+static void	regmust();
+
+static ptr_t
+xcalloc(n, s)
+     int n;
+     size_t s;
+{
+  ptr_t r = calloc(n, s);
+
+  if (!r)
+    regerror("Memory exhausted");
+  return r;
+}
+
+ptr_t				/* Not static, so alloca.o can use it.  */
+xmalloc(n)
+     size_t n;
+{
+  ptr_t r = malloc(n);
+
+  assert(n != 0);
+  if (!r)
+    regerror("Memory exhausted");
+  return r;
+}
+
+static ptr_t
+xrealloc(p, n)
+     ptr_t p;
+     size_t n;
+{
+  ptr_t r = realloc(p, n);
+
+  assert(n != 0);
+  if (!r)
+    regerror("Memory exhausted");
+  return r;
+}
+
+#define CALLOC(p, t, n) ((p) = (t *) xcalloc((n), sizeof (t)))
+#define MALLOC(p, t, n) ((p) = (t *) xmalloc((n) * sizeof (t)))
+#define REALLOC(p, t, n) ((p) = (t *) xrealloc((ptr_t) (p), (n) * sizeof (t)))
+
+/* Reallocate an array of type t if nalloc is too small for index. */
+#define REALLOC_IF_NECESSARY(p, t, nalloc, index) \
+  if ((index) >= (nalloc))			  \
+    {						  \
+      while ((index) >= (nalloc))		  \
+	(nalloc) *= 2;				  \
+      REALLOC(p, t, nalloc);			  \
+    }
+
+#ifdef DEBUG
+#include <stdio.h>
+
+static void
+prtok(t)
+     _token t;
+{
+  char *s;
+
+  if (t < 0)
+    fprintf(stderr, "END");
+  else if (t < _NOTCHAR)
+    fprintf(stderr, "%c", t);
+  else
+    {
+      switch (t)
+	{
+	case _EMPTY: s = "EMPTY"; break;
+	case _BACKREF: s = "BACKREF"; break;
+	case _BEGLINE: s = "BEGLINE"; break;
+	case _ALLBEGLINE: s = "ALLBEGLINE"; break;
+	case _ENDLINE: s = "ENDLINE"; break;
+	case _ALLENDLINE: s = "ALLENDLINE"; break;
+	case _BEGWORD: s = "BEGWORD"; break;
+	case _ENDWORD: s = "ENDWORD"; break;
+	case _LIMWORD: s = "LIMWORD"; break;
+	case _NOTLIMWORD: s = "NOTLIMWORD"; break;
+	case _QMARK: s = "QMARK"; break;
+	case _STAR: s = "STAR"; break;
+	case _PLUS: s = "PLUS"; break;
+	case _CAT: s = "CAT"; break;
+	case _OR: s = "OR"; break;
+	case _LPAREN: s = "LPAREN"; break;
+	case _RPAREN: s = "RPAREN"; break;
+	default: s = "SET"; break;
+	}
+      fprintf(stderr, "%s", s);
+    }
+}
+#endif /* DEBUG */
+
+/* Stuff pertaining to charsets. */
+
+static int
+tstbit(b, c)
+     int b;
+     _charset c;
+{
+  return c[b / INTBITS] & 1 << b % INTBITS;
+}
+
+static void
+setbit(b, c)
+     int b;
+     _charset c;
+{
+  c[b / INTBITS] |= 1 << b % INTBITS;
+}
+
+static void
+clrbit(b, c)
+     int b;
+     _charset c;
+{
+  c[b / INTBITS] &= ~(1 << b % INTBITS);
+}
+
+static void
+copyset(src, dst)
+     const _charset src;
+     _charset dst;
+{
+  int i;
+
+  for (i = 0; i < _CHARSET_INTS; ++i)
+    dst[i] = src[i];
+}
+
+static void
+zeroset(s)
+     _charset s;
+{
+  int i;
+
+  for (i = 0; i < _CHARSET_INTS; ++i)
+    s[i] = 0;
+}
+
+static void
+notset(s)
+     _charset s;
+{
+  int i;
+
+  for (i = 0; i < _CHARSET_INTS; ++i)
+    s[i] = ~s[i];
+}
+
+static int
+equal(s1, s2)
+     const _charset s1;
+     const _charset s2;
+{
+  int i;
+
+  for (i = 0; i < _CHARSET_INTS; ++i)
+    if (s1[i] != s2[i])
+      return 0;
+  return 1;
+}
+
+/* A pointer to the current regexp is kept here during parsing. */
+static struct regexp *reg;
+
+/* Find the index of charset s in reg->charsets, or allocate a new charset. */
+static int
+charset_index(s)
+     const _charset s;
+{
+  int i;
+
+  for (i = 0; i < reg->cindex; ++i)
+    if (equal(s, reg->charsets[i]))
+      return i;
+  REALLOC_IF_NECESSARY(reg->charsets, _charset, reg->calloc, reg->cindex);
+  ++reg->cindex;
+  copyset(s, reg->charsets[i]);
+  return i;
+}
+
+/* Syntax bits controlling the behavior of the lexical analyzer. */
+static syntax_bits, syntax_bits_set;
+
+/* Flag for case-folding letters into sets. */
+static case_fold;
+
+/* Entry point to set syntax options. */
+void
+regsyntax(bits, fold)
+     int bits;
+     int fold;
+{
+  syntax_bits_set = 1;
+  syntax_bits = bits;
+  case_fold = fold;
+}
+
+/* Lexical analyzer. */
+static const char *lexstart;	/* Pointer to beginning of input string. */
+static const char *lexptr;	/* Pointer to next input character. */
+static lexleft;			/* Number of characters remaining. */
+static caret_allowed;		/* True if backward context allows ^
+				   (meaningful only if RE_CONTEXT_INDEP_OPS
+				   is turned off). */
+static closure_allowed;		/* True if backward context allows closures
+				   (meaningful only if RE_CONTEXT_INDEP_OPS
+				   is turned off). */
+
+/* Note that characters become unsigned here. */
+#define FETCH(c, eoferr)   	      \
+  {			   	      \
+    if (! lexleft)	   	      \
+      if (eoferr)	   	      \
+	regerror(eoferr);  	      \
+      else		   	      \
+	return _END;	   	      \
+    (c) = (unsigned char) *lexptr++;  \
+    --lexleft;		   	      \
+  }
+
+static _token
+lex()
+{
+  _token c, c2;
+  int invert;
+  _charset cset;
+
+  FETCH(c, (char *) 0);
+  switch (c)
+    {
+    case '^':
+      if (! (syntax_bits & RE_CONTEXT_INDEP_OPS)
+	  && (!caret_allowed ||
+	      (syntax_bits & RE_TIGHT_VBAR) && lexptr - 1 != lexstart))
+	goto normal_char;
+      caret_allowed = 0;
+      return syntax_bits & RE_TIGHT_VBAR ? _ALLBEGLINE : _BEGLINE;
+
+    case '$':
+      if (syntax_bits & RE_CONTEXT_INDEP_OPS || !lexleft
+	  || (! (syntax_bits & RE_TIGHT_VBAR)
+	      && ((syntax_bits & RE_NO_BK_PARENS
+		   ? lexleft > 0 && *lexptr == ')'
+		   : lexleft > 1 && *lexptr == '\\' && lexptr[1] == ')')
+		  || (syntax_bits & RE_NO_BK_VBAR
+		      ? lexleft > 0 && *lexptr == '|'
+		      : lexleft > 1 && *lexptr == '\\' && lexptr[1] == '|'))))
+	return syntax_bits & RE_TIGHT_VBAR ? _ALLENDLINE : _ENDLINE;
+      goto normal_char;
+
+    case '\\':
+      FETCH(c, "Unfinished \\ quote");
+      switch (c)
+	{
+	case '1':
+	case '2':
+	case '3':
+	case '4':
+	case '5':
+	case '6':
+	case '7':
+	case '8':
+	case '9':
+	  caret_allowed = 0;
+	  closure_allowed = 1;
+	  return _BACKREF;
+
+	case '<':
+	  caret_allowed = 0;
+	  return _BEGWORD;
+
+	case '>':
+	  caret_allowed = 0;
+	  return _ENDWORD;
+
+	case 'b':
+	  caret_allowed = 0;
+	  return _LIMWORD;
+
+	case 'B':
+	  caret_allowed = 0;
+	  return _NOTLIMWORD;
+
+	case 'w':
+	case 'W':
+	  zeroset(cset);
+	  for (c2 = 0; c2 < _NOTCHAR; ++c2)
+	    if (ISALNUM(c2))
+	      setbit(c2, cset);
+	  if (c == 'W')
+	    notset(cset);
+	  caret_allowed = 0;
+	  closure_allowed = 1;
+	  return _SET + charset_index(cset);
+
+	case '?':
+	  if (syntax_bits & RE_BK_PLUS_QM)
+	    goto qmark;
+	  goto normal_char;
+
+	case '+':
+	  if (syntax_bits & RE_BK_PLUS_QM)
+	    goto plus;
+	  goto normal_char;
+
+	case '|':
+	  if (! (syntax_bits & RE_NO_BK_VBAR))
+	    goto or;
+	  goto normal_char;
+
+	case '(':
+	  if (! (syntax_bits & RE_NO_BK_PARENS))
+	    goto lparen;
+	  goto normal_char;
+
+	case ')':
+	  if (! (syntax_bits & RE_NO_BK_PARENS))
+	    goto rparen;
+	  goto normal_char;
+
+	default:
+	  goto normal_char;
+	}
+
+    case '?':
+      if (syntax_bits & RE_BK_PLUS_QM)
+	goto normal_char;
+    qmark:
+      if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
+	goto normal_char;
+      return _QMARK;
+
+    case '*':
+      if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
+	goto normal_char;
+      return _STAR;
+
+    case '+':
+      if (syntax_bits & RE_BK_PLUS_QM)
+	goto normal_char;
+    plus:
+      if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
+	goto normal_char;
+      return _PLUS;
+
+    case '|':
+      if (! (syntax_bits & RE_NO_BK_VBAR))
+	goto normal_char;
+    or:
+      caret_allowed = 1;
+      closure_allowed = 0;
+      return _OR;
+
+    case '\n':
+      if (! (syntax_bits & RE_NEWLINE_OR))
+	goto normal_char;
+      goto or;
+
+    case '(':
+      if (! (syntax_bits & RE_NO_BK_PARENS))
+	goto normal_char;
+    lparen:
+      caret_allowed = 1;
+      closure_allowed = 0;
+      return _LPAREN;
+
+    case ')':
+      if (! (syntax_bits & RE_NO_BK_PARENS))
+	goto normal_char;
+    rparen:
+      caret_allowed = 0;
+      closure_allowed = 1;
+      return _RPAREN;
+
+    case '.':
+      zeroset(cset);
+      notset(cset);
+      clrbit('\n', cset);
+      caret_allowed = 0;
+      closure_allowed = 1;
+      return _SET + charset_index(cset);
+
+    case '[':
+      zeroset(cset);
+      FETCH(c, "Unbalanced [");
+      if (c == '^')
+	{
+	  FETCH(c, "Unbalanced [");
+	  invert = 1;
+	}
+      else
+	invert = 0;
+      do
+	{
+	  FETCH(c2, "Unbalanced [");
+	  if (c2 == '-')
+	    {
+	      FETCH(c2, "Unbalanced [");
+	      while (c <= c2)
+		{
+		  setbit(c, cset);
+		  if (case_fold)
+		    if (ISUPPER(c))
+		      setbit(tolower(c), cset);
+		    else if (ISLOWER(c))
+		      setbit(toupper(c), cset);
+		  ++c;
+		}
+	      FETCH(c, "Unbalanced [");
+	    }
+	  else
+	    {
+	      setbit(c, cset);
+	      if (case_fold)
+		if (ISUPPER(c))
+		  setbit(tolower(c), cset);
+		else if (ISLOWER(c))
+		  setbit(toupper(c), cset);
+	      c = c2;
+	    }
+	}
+      while (c != ']');
+      if (invert)
+	notset(cset);
+      caret_allowed = 0;
+      closure_allowed = 1;
+      return _SET + charset_index(cset);
+
+    default:
+    normal_char:
+      caret_allowed = 0;
+      closure_allowed = 1;
+      if (case_fold && ISALPHA(c))
+	{
+	  zeroset(cset);
+	  if (isupper(c))
+	    c = tolower(c);
+	  setbit(c, cset);
+	  setbit(toupper(c), cset);
+	  return _SET + charset_index(cset);
+	}
+      return c;
+    }
+}
+
+/* Recursive descent parser for regular expressions. */
+
+static _token tok;		/* Lookahead token. */
+static depth;			/* Current depth of a hypothetical stack
+				   holding deferred productions.  This is
+				   used to determine the depth that will be
+				   required of the real stack later on in
+				   reganalyze(). */
+
+/* Add the given token to the parse tree, maintaining the depth count and
+   updating the maximum depth if necessary. */
+static void
+addtok(t)
+     _token t;
+{
+  REALLOC_IF_NECESSARY(reg->tokens, _token, reg->talloc, reg->tindex);
+  reg->tokens[reg->tindex++] = t;
+
+  switch (t)
+    {
+    case _QMARK:
+    case _STAR:
+    case _PLUS:
+      break;
+
+    case _CAT:
+    case _OR:
+      --depth;
+      break;
+
+    default:
+      ++reg->nleaves;
+    case _EMPTY:
+      ++depth;
+      break;
+    }
+  if (depth > reg->depth)
+    reg->depth = depth;
+}
+
+/* The grammar understood by the parser is as follows.
+
+   start:
+     regexp
+     _ALLBEGLINE regexp
+     regexp _ALLENDLINE
+     _ALLBEGLINE regexp _ALLENDLINE
+
+   regexp:
+     regexp _OR branch
+     branch
+
+   branch:
+     branch closure
+     closure
+
+   closure:
+     closure _QMARK
+     closure _STAR
+     closure _PLUS
+     atom
+
+   atom:
+     <normal character>
+     _SET
+     _BACKREF
+     _BEGLINE
+     _ENDLINE
+     _BEGWORD
+     _ENDWORD
+     _LIMWORD
+     _NOTLIMWORD
+     <empty>
+
+   The parser builds a parse tree in postfix form in an array of tokens. */
+
+#if __STDC__
+static void regexp(void);
+#else
+static void regexp();
+#endif
+
+static void
+atom()
+{
+  if (tok >= 0 && tok < _NOTCHAR || tok >= _SET || tok == _BACKREF
+      || tok == _BEGLINE || tok == _ENDLINE || tok == _BEGWORD
+      || tok == _ENDWORD || tok == _LIMWORD || tok == _NOTLIMWORD)
+    {
+      addtok(tok);
+      tok = lex();
+    }
+  else if (tok == _LPAREN)
+    {
+      tok = lex();
+      regexp();
+      if (tok != _RPAREN)
+	regerror("Unbalanced (");
+      tok = lex();
+    }
+  else
+    addtok(_EMPTY);
+}
+
+static void
+closure()
+{
+  atom();
+  while (tok == _QMARK || tok == _STAR || tok == _PLUS)
+    {
+      addtok(tok);
+      tok = lex();
+    }
+}
+
+static void
+branch()
+{
+  closure();
+  while (tok != _RPAREN && tok != _OR && tok != _ALLENDLINE && tok >= 0)
+    {
+      closure();
+      addtok(_CAT);
+    }
+}
+
+static void
+regexp()
+{
+  branch();
+  while (tok == _OR)
+    {
+      tok = lex();
+      branch();
+      addtok(_OR);
+    }
+}
+
+/* Main entry point for the parser.  S is a string to be parsed, len is the
+   length of the string, so s can include NUL characters.  R is a pointer to
+   the struct regexp to parse into. */
+void
+regparse(s, len, r)
+     const char *s;
+     size_t len;
+     struct regexp *r;
+{
+  reg = r;
+  lexstart = lexptr = s;
+  lexleft = len;
+  caret_allowed = 1;
+  closure_allowed = 0;
+
+  if (! syntax_bits_set)
+    regerror("No syntax specified");
+
+  tok = lex();
+  depth = r->depth;
+
+  if (tok == _ALLBEGLINE)
+    {
+      addtok(_BEGLINE);
+      tok = lex();
+      regexp();
+      addtok(_CAT);
+    }
+  else
+    regexp();
+
+  if (tok == _ALLENDLINE)
+    {
+      addtok(_ENDLINE);
+      addtok(_CAT);
+      tok = lex();
+    }
+
+  if (tok != _END)
+    regerror("Unbalanced )");
+
+  addtok(_END - r->nregexps);
+  addtok(_CAT);
+
+  if (r->nregexps)
+    addtok(_OR);
+
+  ++r->nregexps;
+}
+
+/* Some primitives for operating on sets of positions. */
+
+/* Copy one set to another; the destination must be large enough. */
+static void
+copy(src, dst)
+     const _position_set *src;
+     _position_set *dst;
+{
+  int i;
+
+  for (i = 0; i < src->nelem; ++i)
+    dst->elems[i] = src->elems[i];
+  dst->nelem = src->nelem;
+}
+
+/* Insert a position in a set.  Position sets are maintained in sorted
+   order according to index.  If position already exists in the set with
+   the same index then their constraints are logically or'd together.
+   S->elems must point to an array large enough to hold the resulting set. */
+static void
+insert(p, s)
+     _position p;
+     _position_set *s;
+{
+  int i;
+  _position t1, t2;
+
+  for (i = 0; i < s->nelem && p.index < s->elems[i].index; ++i)
+    ;
+  if (i < s->nelem && p.index == s->elems[i].index)
+    s->elems[i].constraint |= p.constraint;
+  else
+    {
+      t1 = p;
+      ++s->nelem;
+      while (i < s->nelem)
+	{
+	  t2 = s->elems[i];
+	  s->elems[i++] = t1;
+	  t1 = t2;
+	}
+    }
+}
+
+/* Merge two sets of positions into a third.  The result is exactly as if
+   the positions of both sets were inserted into an initially empty set. */
+static void
+merge(s1, s2, m)
+     _position_set *s1;
+     _position_set *s2;
+     _position_set *m;
+{
+  int i = 0, j = 0;
+
+  m->nelem = 0;
+  while (i < s1->nelem && j < s2->nelem)
+    if (s1->elems[i].index > s2->elems[j].index)
+      m->elems[m->nelem++] = s1->elems[i++];
+    else if (s1->elems[i].index < s2->elems[j].index)
+      m->elems[m->nelem++] = s2->elems[j++];
+    else
+      {
+	m->elems[m->nelem] = s1->elems[i++];
+	m->elems[m->nelem++].constraint |= s2->elems[j++].constraint;
+      }
+  while (i < s1->nelem)
+    m->elems[m->nelem++] = s1->elems[i++];
+  while (j < s2->nelem)
+    m->elems[m->nelem++] = s2->elems[j++];
+}
+
+/* Delete a position from a set. */
+static void
+delete(p, s)
+     _position p;
+     _position_set *s;
+{
+  int i;
+
+  for (i = 0; i < s->nelem; ++i)
+    if (p.index == s->elems[i].index)
+      break;
+  if (i < s->nelem)
+    for (--s->nelem; i < s->nelem; ++i)
+      s->elems[i] = s->elems[i + 1];
+}
+
+/* Find the index of the state corresponding to the given position set with
+   the given preceding context, or create a new state if there is no such
+   state.  Newline and letter tell whether we got here on a newline or
+   letter, respectively. */
+static int
+state_index(r, s, newline, letter)
+     struct regexp *r;
+     _position_set *s;
+     int newline;
+     int letter;
+{
+  int hash = 0;
+  int constraint;
+  int i, j;
+
+  newline = newline ? 1 : 0;
+  letter = letter ? 1 : 0;
+
+  for (i = 0; i < s->nelem; ++i)
+    hash ^= s->elems[i].index + s->elems[i].constraint;
+
+  /* Try to find a state that exactly matches the proposed one. */
+  for (i = 0; i < r->sindex; ++i)
+    {
+      if (hash != r->states[i].hash || s->nelem != r->states[i].elems.nelem
+	  || newline != r->states[i].newline || letter != r->states[i].letter)
+	continue;
+      for (j = 0; j < s->nelem; ++j)
+	if (s->elems[j].constraint
+	    != r->states[i].elems.elems[j].constraint
+	    || s->elems[j].index != r->states[i].elems.elems[j].index)
+	  break;
+      if (j == s->nelem)
+	return i;
+    }
+
+  /* We'll have to create a new state. */
+  REALLOC_IF_NECESSARY(r->states, _dfa_state, r->salloc, r->sindex);
+  r->states[i].hash = hash;
+  MALLOC(r->states[i].elems.elems, _position, s->nelem);
+  copy(s, &r->states[i].elems);
+  r->states[i].newline = newline;
+  r->states[i].letter = letter;
+  r->states[i].backref = 0;
+  r->states[i].constraint = 0;
+  r->states[i].first_end = 0;
+  for (j = 0; j < s->nelem; ++j)
+    if (r->tokens[s->elems[j].index] < 0)
+      {
+	constraint = s->elems[j].constraint;
+	if (_SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 0)
+	    || _SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 1)
+	    || _SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 0)
+	    || _SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 1))
+	  r->states[i].constraint |= constraint;
+	if (! r->states[i].first_end)
+	  r->states[i].first_end = r->tokens[s->elems[j].index];
+      }
+    else if (r->tokens[s->elems[j].index] == _BACKREF)
+      {
+	r->states[i].constraint = _NO_CONSTRAINT;
+	r->states[i].backref = 1;
+      }
+
+  ++r->sindex;
+
+  return i;
+}
+
+/* Find the epsilon closure of a set of positions.  If any position of the set
+   contains a symbol that matches the empty string in some context, replace
+   that position with the elements of its follow labeled with an appropriate
+   constraint.  Repeat exhaustively until no funny positions are left.
+   S->elems must be large enough to hold the result. */
+void
+epsclosure(s, r)
+     _position_set *s;
+     struct regexp *r;
+{
+  int i, j;
+  int *visited;
+  _position p, old;
+
+  MALLOC(visited, int, r->tindex);
+  for (i = 0; i < r->tindex; ++i)
+    visited[i] = 0;
+
+  for (i = 0; i < s->nelem; ++i)
+    if (r->tokens[s->elems[i].index] >= _NOTCHAR
+	&& r->tokens[s->elems[i].index] != _BACKREF
+	&& r->tokens[s->elems[i].index] < _SET)
+      {
+	old = s->elems[i];
+	p.constraint = old.constraint;
+	delete(s->elems[i], s);
+	if (visited[old.index])
+	  {
+	    --i;
+	    continue;
+	  }
+	visited[old.index] = 1;
+	switch (r->tokens[old.index])
+	  {
+	  case _BEGLINE:
+	    p.constraint &= _BEGLINE_CONSTRAINT;
+	    break;
+	  case _ENDLINE:
+	    p.constraint &= _ENDLINE_CONSTRAINT;
+	    break;
+	  case _BEGWORD:
+	    p.constraint &= _BEGWORD_CONSTRAINT;
+	    break;
+	  case _ENDWORD:
+	    p.constraint &= _ENDWORD_CONSTRAINT;
+	    break;
+	  case _LIMWORD:
+	    p.constraint &= _LIMWORD_CONSTRAINT;
+	    break;
+	  case _NOTLIMWORD:
+	    p.constraint &= _NOTLIMWORD_CONSTRAINT;
+	    break;
+	  default:
+	    break;
+	  }
+	for (j = 0; j < r->follows[old.index].nelem; ++j)
+	  {
+	    p.index = r->follows[old.index].elems[j].index;
+	    insert(p, s);
+	  }
+	/* Force rescan to start at the beginning. */
+	i = -1;
+      }
+
+  free(visited);
+}
+
+/* Perform bottom-up analysis on the parse tree, computing various functions.
+   Note that at this point, we're pretending constructs like \< are real
+   characters rather than constraints on what can follow them.
+
+   Nullable:  A node is nullable if it is at the root of a regexp that can
+   match the empty string.
+   *  _EMPTY leaves are nullable.
+   * No other leaf is nullable.
+   * A _QMARK or _STAR node is nullable.
+   * A _PLUS node is nullable if its argument is nullable.
+   * A _CAT node is nullable if both its arguments are nullable.
+   * An _OR node is nullable if either argument is nullable.
+
+   Firstpos:  The firstpos of a node is the set of positions (nonempty leaves)
+   that could correspond to the first character of a string matching the
+   regexp rooted at the given node.
+   * _EMPTY leaves have empty firstpos.
+   * The firstpos of a nonempty leaf is that leaf itself.
+   * The firstpos of a _QMARK, _STAR, or _PLUS node is the firstpos of its
+     argument.
+   * The firstpos of a _CAT node is the firstpos of the left argument, union
+     the firstpos of the right if the left argument is nullable.
+   * The firstpos of an _OR node is the union of firstpos of each argument.
+
+   Lastpos:  The lastpos of a node is the set of positions that could
+   correspond to the last character of a string matching the regexp at
+   the given node.
+   * _EMPTY leaves have empty lastpos.
+   * The lastpos of a nonempty leaf is that leaf itself.
+   * The lastpos of a _QMARK, _STAR, or _PLUS node is the lastpos of its
+     argument.
+   * The lastpos of a _CAT node is the lastpos of its right argument, union
+     the lastpos of the left if the right argument is nullable.
+   * The lastpos of an _OR node is the union of the lastpos of each argument.
+
+   Follow:  The follow of a position is the set of positions that could
+   correspond to the character following a character matching the node in
+   a string matching the regexp.  At this point we consider special symbols
+   that match the empty string in some context to be just normal characters.
+   Later, if we find that a special symbol is in a follow set, we will
+   replace it with the elements of its follow, labeled with an appropriate
+   constraint.
+   * Every node in the firstpos of the argument of a _STAR or _PLUS node is in
+     the follow of every node in the lastpos.
+   * Every node in the firstpos of the second argument of a _CAT node is in
+     the follow of every node in the lastpos of the first argument.
+
+   Because of the postfix representation of the parse tree, the depth-first
+   analysis is conveniently done by a linear scan with the aid of a stack.
+   Sets are stored as arrays of the elements, obeying a stack-like allocation
+   scheme; the number of elements in each set deeper in the stack can be
+   used to determine the address of a particular set's array. */
+void
+reganalyze(r, searchflag)
+     struct regexp *r;
+     int searchflag;
+{
+  int *nullable;		/* Nullable stack. */
+  int *nfirstpos;		/* Element count stack for firstpos sets. */
+  _position *firstpos;		/* Array where firstpos elements are stored. */
+  int *nlastpos;		/* Element count stack for lastpos sets. */
+  _position *lastpos;		/* Array where lastpos elements are stored. */
+  int *nalloc;			/* Sizes of arrays allocated to follow sets. */
+  _position_set tmp;		/* Temporary set for merging sets. */
+  _position_set merged;		/* Result of merging sets. */
+  int wants_newline;		/* True if some position wants newline info. */
+  int *o_nullable;
+  int *o_nfirst, *o_nlast;
+  _position *o_firstpos, *o_lastpos;
+  int i, j;
+  _position *pos;
+
+#ifdef DEBUG
+  fprintf(stderr, "reganalyze:\n");
+  for (i = 0; i < r->tindex; ++i)
+    {
+      fprintf(stderr, " %d:", i);
+      prtok(r->tokens[i]);
+    }
+  putc('\n', stderr);
+#endif
+
+  r->searchflag = searchflag;
+
+  MALLOC(nullable, int, r->depth);
+  o_nullable = nullable;
+  MALLOC(nfirstpos, int, r->depth);
+  o_nfirst = nfirstpos;
+  MALLOC(firstpos, _position, r->nleaves);
+  o_firstpos = firstpos, firstpos += r->nleaves;
+  MALLOC(nlastpos, int, r->depth);
+  o_nlast = nlastpos;
+  MALLOC(lastpos, _position, r->nleaves);
+  o_lastpos = lastpos, lastpos += r->nleaves;
+  MALLOC(nalloc, int, r->tindex);
+  for (i = 0; i < r->tindex; ++i)
+    nalloc[i] = 0;
+  MALLOC(merged.elems, _position, r->nleaves);
+
+  CALLOC(r->follows, _position_set, r->tindex);
+
+  for (i = 0; i < r->tindex; ++i)
+#ifdef DEBUG
+    {				/* Nonsyntactic #ifdef goo... */
+#endif
+    switch (r->tokens[i])
+      {
+      case _EMPTY:
+	/* The empty set is nullable. */
+	*nullable++ = 1;
+
+	/* The firstpos and lastpos of the empty leaf are both empty. */
+	*nfirstpos++ = *nlastpos++ = 0;
+	break;
+
+      case _STAR:
+      case _PLUS:
+	/* Every element in the firstpos of the argument is in the follow
+	   of every element in the lastpos. */
+	tmp.nelem = nfirstpos[-1];
+	tmp.elems = firstpos;
+	pos = lastpos;
+	for (j = 0; j < nlastpos[-1]; ++j)
+	  {
+	    merge(&tmp, &r->follows[pos[j].index], &merged);
+	    REALLOC_IF_NECESSARY(r->follows[pos[j].index].elems, _position,
+				 nalloc[pos[j].index], merged.nelem - 1);
+	    copy(&merged, &r->follows[pos[j].index]);
+	  }
+
+      case _QMARK:
+	/* A _QMARK or _STAR node is automatically nullable. */
+	if (r->tokens[i] != _PLUS)
+	  nullable[-1] = 1;
+	break;
+
+      case _CAT:
+	/* Every element in the firstpos of the second argument is in the
+	   follow of every element in the lastpos of the first argument. */
+	tmp.nelem = nfirstpos[-1];
+	tmp.elems = firstpos;
+	pos = lastpos + nlastpos[-1];
+	for (j = 0; j < nlastpos[-2]; ++j)
+	  {
+	    merge(&tmp, &r->follows[pos[j].index], &merged);
+	    REALLOC_IF_NECESSARY(r->follows[pos[j].index].elems, _position,
+				 nalloc[pos[j].index], merged.nelem - 1);
+	    copy(&merged, &r->follows[pos[j].index]);
+	  }
+
+	/* The firstpos of a _CAT node is the firstpos of the first argument,
+	   union that of the second argument if the first is nullable. */
+	if (nullable[-2])
+	  nfirstpos[-2] += nfirstpos[-1];
+	else
+	  firstpos += nfirstpos[-1];
+	--nfirstpos;
+
+	/* The lastpos of a _CAT node is the lastpos of the second argument,
+	   union that of the first argument if the second is nullable. */
+	if (nullable[-1])
+	  nlastpos[-2] += nlastpos[-1];
+	else
+	  {
+	    pos = lastpos + nlastpos[-2];
+	    for (j = nlastpos[-1] - 1; j >= 0; --j)
+	      pos[j] = lastpos[j];
+	    lastpos += nlastpos[-2];
+	    nlastpos[-2] = nlastpos[-1];
+	  }
+	--nlastpos;
+
+	/* A _CAT node is nullable if both arguments are nullable. */
+	nullable[-2] = nullable[-1] && nullable[-2];
+	--nullable;
+	break;
+
+      case _OR:
+	/* The firstpos is the union of the firstpos of each argument. */
+	nfirstpos[-2] += nfirstpos[-1];
+	--nfirstpos;
+
+	/* The lastpos is the union of the lastpos of each argument. */
+	nlastpos[-2] += nlastpos[-1];
+	--nlastpos;
+
+	/* An _OR node is nullable if either argument is nullable. */
+	nullable[-2] = nullable[-1] || nullable[-2];
+	--nullable;
+	break;
+
+      default:
+	/* Anything else is a nonempty position.  (Note that special
+	   constructs like \< are treated as nonempty strings here;
+	   an "epsilon closure" effectively makes them nullable later.
+	   Backreferences have to get a real position so we can detect
+	   transitions on them later.  But they are nullable. */
+	*nullable++ = r->tokens[i] == _BACKREF;
+
+	/* This position is in its own firstpos and lastpos. */
+	*nfirstpos++ = *nlastpos++ = 1;
+	--firstpos, --lastpos;
+	firstpos->index = lastpos->index = i;
+	firstpos->constraint = lastpos->constraint = _NO_CONSTRAINT;
+
+	/* Allocate the follow set for this position. */
+	nalloc[i] = 1;
+	MALLOC(r->follows[i].elems, _position, nalloc[i]);
+	break;
+      }
+#ifdef DEBUG
+    /* ... balance the above nonsyntactic #ifdef goo... */
+      fprintf(stderr, "node %d:", i);
+      prtok(r->tokens[i]);
+      putc('\n', stderr);
+      fprintf(stderr, nullable[-1] ? " nullable: yes\n" : " nullable: no\n");
+      fprintf(stderr, " firstpos:");
+      for (j = nfirstpos[-1] - 1; j >= 0; --j)
+	{
+	  fprintf(stderr, " %d:", firstpos[j].index);
+	  prtok(r->tokens[firstpos[j].index]);
+	}
+      fprintf(stderr, "\n lastpos:");
+      for (j = nlastpos[-1] - 1; j >= 0; --j)
+	{
+	  fprintf(stderr, " %d:", lastpos[j].index);
+	  prtok(r->tokens[lastpos[j].index]);
+	}
+      putc('\n', stderr);
+    }
+#endif
+
+  /* For each follow set that is the follow set of a real position, replace
+     it with its epsilon closure. */
+  for (i = 0; i < r->tindex; ++i)
+    if (r->tokens[i] < _NOTCHAR || r->tokens[i] == _BACKREF
+	|| r->tokens[i] >= _SET)
+      {
+#ifdef DEBUG
+	fprintf(stderr, "follows(%d:", i);
+	prtok(r->tokens[i]);
+	fprintf(stderr, "):");
+	for (j = r->follows[i].nelem - 1; j >= 0; --j)
+	  {
+	    fprintf(stderr, " %d:", r->follows[i].elems[j].index);
+	    prtok(r->tokens[r->follows[i].elems[j].index]);
+	  }
+	putc('\n', stderr);
+#endif
+	copy(&r->follows[i], &merged);
+	epsclosure(&merged, r);
+	if (r->follows[i].nelem < merged.nelem)
+	  REALLOC(r->follows[i].elems, _position, merged.nelem);
+	copy(&merged, &r->follows[i]);
+      }
+
+  /* Get the epsilon closure of the firstpos of the regexp.  The result will
+     be the set of positions of state 0. */
+  merged.nelem = 0;
+  for (i = 0; i < nfirstpos[-1]; ++i)
+    insert(firstpos[i], &merged);
+  epsclosure(&merged, r);
+
+  /* Check if any of the positions of state 0 will want newline context. */
+  wants_newline = 0;
+  for (i = 0; i < merged.nelem; ++i)
+    if (_PREV_NEWLINE_DEPENDENT(merged.elems[i].constraint))
+      wants_newline = 1;
+
+  /* Build the initial state. */
+  r->salloc = 1;
+  r->sindex = 0;
+  MALLOC(r->states, _dfa_state, r->salloc);
+  state_index(r, &merged, wants_newline, 0);
+
+  free(o_nullable);
+  free(o_nfirst);
+  free(o_firstpos);
+  free(o_nlast);
+  free(o_lastpos);
+  free(nalloc);
+  free(merged.elems);
+}
+
+/* Find, for each character, the transition out of state s of r, and store
+   it in the appropriate slot of trans.
+
+   We divide the positions of s into groups (positions can appear in more
+   than one group).  Each group is labeled with a set of characters that
+   every position in the group matches (taking into account, if necessary,
+   preceding context information of s).  For each group, find the union
+   of the its elements' follows.  This set is the set of positions of the
+   new state.  For each character in the group's label, set the transition
+   on this character to be to a state corresponding to the set's positions,
+   and its associated backward context information, if necessary.
+
+   If we are building a searching matcher, we include the positions of state
+   0 in every state.
+
+   The collection of groups is constructed by building an equivalence-class
+   partition of the positions of s.
+
+   For each position, find the set of characters C that it matches.  Eliminate
+   any characters from C that fail on grounds of backward context.
+
+   Search through the groups, looking for a group whose label L has nonempty
+   intersection with C.  If L - C is nonempty, create a new group labeled
+   L - C and having the same positions as the current group, and set L to
+   the intersection of L and C.  Insert the position in this group, set
+   C = C - L, and resume scanning.
+
+   If after comparing with every group there are characters remaining in C,
+   create a new group labeled with the characters of C and insert this
+   position in that group. */
+void
+regstate(s, r, trans)
+     int s;
+     struct regexp *r;
+     int trans[];
+{
+  _position_set grps[_NOTCHAR];	/* As many as will ever be needed. */
+  _charset labels[_NOTCHAR];	/* Labels corresponding to the groups. */
+  int ngrps = 0;		/* Number of groups actually used. */
+  _position pos;		/* Current position being considered. */
+  _charset matches;		/* Set of matching characters. */
+  int matchesf;			/* True if matches is nonempty. */
+  _charset intersect;		/* Intersection with some label set. */
+  int intersectf;		/* True if intersect is nonempty. */
+  _charset leftovers;		/* Stuff in the label that didn't match. */
+  int leftoversf;		/* True if leftovers is nonempty. */
+  static _charset letters;	/* Set of characters considered letters. */
+  static _charset newline;	/* Set of characters that aren't newline. */
+  _position_set follows;	/* Union of the follows of some group. */
+  _position_set tmp;		/* Temporary space for merging sets. */
+  int state;			/* New state. */
+  int wants_newline;		/* New state wants to know newline context. */
+  int state_newline;		/* New state on a newline transition. */
+  int wants_letter;		/* New state wants to know letter context. */
+  int state_letter;		/* New state on a letter transition. */
+  static initialized;		/* Flag for static initialization. */
+  int i, j, k;
+
+  /* Initialize the set of letters, if necessary. */
+  if (! initialized)
+    {
+      initialized = 1;
+      for (i = 0; i < _NOTCHAR; ++i)
+	if (ISALNUM(i))
+	  setbit(i, letters);
+      setbit('\n', newline);
+    }
+
+  zeroset(matches);
+
+  for (i = 0; i < r->states[s].elems.nelem; ++i)
+    {
+      pos = r->states[s].elems.elems[i];
+      if (r->tokens[pos.index] >= 0 && r->tokens[pos.index] < _NOTCHAR)
+	setbit(r->tokens[pos.index], matches);
+      else if (r->tokens[pos.index] >= _SET)
+	copyset(r->charsets[r->tokens[pos.index] - _SET], matches);
+      else
+	continue;
+
+      /* Some characters may need to be eliminated from matches because
+	 they fail in the current context. */
+      if (pos.constraint != 0xFF)
+	{
+	  if (! _MATCHES_NEWLINE_CONTEXT(pos.constraint,
+					 r->states[s].newline, 1))
+	    clrbit('\n', matches);
+	  if (! _MATCHES_NEWLINE_CONTEXT(pos.constraint,
+					 r->states[s].newline, 0))
+	    for (j = 0; j < _CHARSET_INTS; ++j)
+	      matches[j] &= newline[j];
+	  if (! _MATCHES_LETTER_CONTEXT(pos.constraint,
+					r->states[s].letter, 1))
+	    for (j = 0; j < _CHARSET_INTS; ++j)
+	      matches[j] &= ~letters[j];
+	  if (! _MATCHES_LETTER_CONTEXT(pos.constraint,
+					r->states[s].letter, 0))
+	    for (j = 0; j < _CHARSET_INTS; ++j)
+	      matches[j] &= letters[j];
+
+	  /* If there are no characters left, there's no point in going on. */
+	  for (j = 0; j < _CHARSET_INTS && !matches[j]; ++j)
+	    ;
+	  if (j == _CHARSET_INTS)
+	    continue;
+	}
+
+      for (j = 0; j < ngrps; ++j)
+	{
+	  /* If matches contains a single character only, and the current
+	     group's label doesn't contain that character, go on to the
+	     next group. */
+	  if (r->tokens[pos.index] >= 0 && r->tokens[pos.index] < _NOTCHAR
+	      && !tstbit(r->tokens[pos.index], labels[j]))
+	    continue;
+
+	  /* Check if this group's label has a nonempty intersection with
+	     matches. */
+	  intersectf = 0;
+	  for (k = 0; k < _CHARSET_INTS; ++k)
+	    (intersect[k] = matches[k] & labels[j][k]) ? intersectf = 1 : 0;
+	  if (! intersectf)
+	    continue;
+
+	  /* It does; now find the set differences both ways. */
+	  leftoversf = matchesf = 0;
+	  for (k = 0; k < _CHARSET_INTS; ++k)
+	    {
+	      /* Even an optimizing compiler can't know this for sure. */
+	      int match = matches[k], label = labels[j][k];
+
+	      (leftovers[k] = ~match & label) ? leftoversf = 1 : 0;
+	      (matches[k] = match & ~label) ? matchesf = 1 : 0;
+	    }
+
+	  /* If there were leftovers, create a new group labeled with them. */
+	  if (leftoversf)
+	    {
+	      copyset(leftovers, labels[ngrps]);
+	      copyset(intersect, labels[j]);
+	      MALLOC(grps[ngrps].elems, _position, r->nleaves);
+	      copy(&grps[j], &grps[ngrps]);
+	      ++ngrps;
+	    }
+
+	  /* Put the position in the current group.  Note that there is no
+	     reason to call insert() here. */
+	  grps[j].elems[grps[j].nelem++] = pos;
+
+	  /* If every character matching the current position has been
+	     accounted for, we're done. */
+	  if (! matchesf)
+	    break;
+	}
+
+      /* If we've passed the last group, and there are still characters
+	 unaccounted for, then we'll have to create a new group. */
+      if (j == ngrps)
+	{
+	  copyset(matches, labels[ngrps]);
+	  zeroset(matches);
+	  MALLOC(grps[ngrps].elems, _position, r->nleaves);
+	  grps[ngrps].nelem = 1;
+	  grps[ngrps].elems[0] = pos;
+	  ++ngrps;
+	}
+    }
+
+  MALLOC(follows.elems, _position, r->nleaves);
+  MALLOC(tmp.elems, _position, r->nleaves);
+
+  /* If we are a searching matcher, the default transition is to a state
+     containing the positions of state 0, otherwise the default transition
+     is to fail miserably. */
+  if (r->searchflag)
+    {
+      wants_newline = 0;
+      wants_letter = 0;
+      for (i = 0; i < r->states[0].elems.nelem; ++i)
+	{
+	  if (_PREV_NEWLINE_DEPENDENT(r->states[0].elems.elems[i].constraint))
+	    wants_newline = 1;
+	  if (_PREV_LETTER_DEPENDENT(r->states[0].elems.elems[i].constraint))
+	    wants_letter = 1;
+	}
+      copy(&r->states[0].elems, &follows);
+      state = state_index(r, &follows, 0, 0);
+      if (wants_newline)
+	state_newline = state_index(r, &follows, 1, 0);
+      else
+	state_newline = state;
+      if (wants_letter)
+	state_letter = state_index(r, &follows, 0, 1);
+      else
+	state_letter = state;
+      for (i = 0; i < _NOTCHAR; ++i)
+	if (i == '\n')
+	  trans[i] = state_newline;
+	else if (ISALNUM(i))
+	  trans[i] = state_letter;
+	else
+	  trans[i] = state;
+    }
+  else
+    for (i = 0; i < _NOTCHAR; ++i)
+      trans[i] = -1;
+
+  for (i = 0; i < ngrps; ++i)
+    {
+      follows.nelem = 0;
+
+      /* Find the union of the follows of the positions of the group.
+	 This is a hideously inefficient loop.  Fix it someday. */
+      for (j = 0; j < grps[i].nelem; ++j)
+	for (k = 0; k < r->follows[grps[i].elems[j].index].nelem; ++k)
+	  insert(r->follows[grps[i].elems[j].index].elems[k], &follows);
+
+      /* If we are building a searching matcher, throw in the positions
+	 of state 0 as well. */
+      if (r->searchflag)
+	for (j = 0; j < r->states[0].elems.nelem; ++j)
+	  insert(r->states[0].elems.elems[j], &follows);
+
+      /* Find out if the new state will want any context information. */
+      wants_newline = 0;
+      if (tstbit('\n', labels[i]))
+	for (j = 0; j < follows.nelem; ++j)
+	  if (_PREV_NEWLINE_DEPENDENT(follows.elems[j].constraint))
+	    wants_newline = 1;
+
+      wants_letter = 0;
+      for (j = 0; j < _CHARSET_INTS; ++j)
+	if (labels[i][j] & letters[j])
+	  break;
+      if (j < _CHARSET_INTS)
+	for (j = 0; j < follows.nelem; ++j)
+	  if (_PREV_LETTER_DEPENDENT(follows.elems[j].constraint))
+	    wants_letter = 1;
+
+      /* Find the state(s) corresponding to the union of the follows. */
+      state = state_index(r, &follows, 0, 0);
+      if (wants_newline)
+	state_newline = state_index(r, &follows, 1, 0);
+      else
+	state_newline = state;
+      if (wants_letter)
+	state_letter = state_index(r, &follows, 0, 1);
+      else
+	state_letter = state;
+
+      /* Set the transitions for each character in the current label. */
+      for (j = 0; j < _CHARSET_INTS; ++j)
+	for (k = 0; k < INTBITS; ++k)
+	  if (labels[i][j] & 1 << k)
+	    {
+	      int c = j * INTBITS + k;
+
+	      if (c == '\n')
+		trans[c] = state_newline;
+	      else if (ISALNUM(c))
+		trans[c] = state_letter;
+	      else if (c < _NOTCHAR)
+		trans[c] = state;
+	    }
+    }
+
+  for (i = 0; i < ngrps; ++i)
+    free(grps[i].elems);
+  free(follows.elems);
+  free(tmp.elems);
+}
+
+/* Some routines for manipulating a compiled regexp's transition tables.
+   Each state may or may not have a transition table; if it does, and it
+   is a non-accepting state, then r->trans[state] points to its table.
+   If it is an accepting state then r->fails[state] points to its table.
+   If it has no table at all, then r->trans[state] is NULL.
+   TODO: Improve this comment, get rid of the unnecessary redundancy. */
+
+static void
+build_state(s, r)
+     int s;
+     struct regexp *r;
+{
+  int *trans;			/* The new transition table. */
+  int i;
+
+  /* Set an upper limit on the number of transition tables that will ever
+     exist at once.  1024 is arbitrary.  The idea is that the frequently
+     used transition tables will be quickly rebuilt, whereas the ones that
+     were only needed once or twice will be cleared away. */
+  if (r->trcount >= 1024)
+    {
+      for (i = 0; i < r->tralloc; ++i)
+	if (r->trans[i])
+	  {
+	    free((ptr_t) r->trans[i]);
+	    r->trans[i] = NULL;
+	  }
+	else if (r->fails[i])
+	  {
+	    free((ptr_t) r->fails[i]);
+	    r->fails[i] = NULL;
+	  }
+      r->trcount = 0;
+    }
+
+  ++r->trcount;
+
+  /* Set up the success bits for this state. */
+  r->success[s] = 0;
+  if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 1, r->states[s].letter, 0,
+      s, *r))
+    r->success[s] |= 4;
+  if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 0, r->states[s].letter, 1,
+      s, *r))
+    r->success[s] |= 2;
+  if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 0, r->states[s].letter, 0,
+      s, *r))
+    r->success[s] |= 1;
+
+  MALLOC(trans, int, _NOTCHAR);
+  regstate(s, r, trans);
+
+  /* Now go through the new transition table, and make sure that the trans
+     and fail arrays are allocated large enough to hold a pointer for the
+     largest state mentioned in the table. */
+  for (i = 0; i < _NOTCHAR; ++i)
+    if (trans[i] >= r->tralloc)
+      {
+	int oldalloc = r->tralloc;
+
+	while (trans[i] >= r->tralloc)
+	  r->tralloc *= 2;
+	REALLOC(r->realtrans, int *, r->tralloc + 1);
+	r->trans = r->realtrans + 1;
+	REALLOC(r->fails, int *, r->tralloc);
+	REALLOC(r->success, int, r->tralloc);
+	REALLOC(r->newlines, int, r->tralloc);
+	while (oldalloc < r->tralloc)
+	  {
+	    r->trans[oldalloc] = NULL;
+	    r->fails[oldalloc++] = NULL;
+	  }
+      }
+
+  /* Keep the newline transition in a special place so we can use it as
+     a sentinel. */
+  r->newlines[s] = trans['\n'];
+  trans['\n'] = -1;
+
+  if (ACCEPTING(s, *r))
+    r->fails[s] = trans;
+  else
+    r->trans[s] = trans;
+}
+
+static void
+build_state_zero(r)
+     struct regexp *r;
+{
+  r->tralloc = 1;
+  r->trcount = 0;
+  CALLOC(r->realtrans, int *, r->tralloc + 1);
+  r->trans = r->realtrans + 1;
+  CALLOC(r->fails, int *, r->tralloc);
+  MALLOC(r->success, int, r->tralloc);
+  MALLOC(r->newlines, int, r->tralloc);
+  build_state(0, r);
+}
+
+/* Search through a buffer looking for a match to the given struct regexp.
+   Find the first occurrence of a string matching the regexp in the buffer,
+   and the shortest possible version thereof.  Return a pointer to the first
+   character after the match, or NULL if none is found.  Begin points to
+   the beginning of the buffer, and end points to the first character after
+   its end.  We store a newline in *end to act as a sentinel, so end had
+   better point somewhere valid.  Newline is a flag indicating whether to
+   allow newlines to be in the matching string.  If count is non-
+   NULL it points to a place we're supposed to increment every time we
+   see a newline.  Finally, if backref is non-NULL it points to a place
+   where we're supposed to store a 1 if backreferencing happened and the
+   match needs to be verified by a backtracking matcher.  Otherwise
+   we store a 0 in *backref. */
+char *
+regexecute(r, begin, end, newline, count, backref)
+     struct regexp *r;
+     char *begin;
+     char *end;
+     int newline;
+     int *count;
+     int *backref;
+{
+  register s, s1, tmp;		/* Current state. */
+  register unsigned char *p;	/* Current input character. */
+  register **trans, *t;		/* Copy of r->trans so it can be optimized
+				   into a register. */
+  static sbit[_NOTCHAR];	/* Table for anding with r->success. */
+  static sbit_init;
+
+  if (! sbit_init)
+    {
+      int i;
+
+      sbit_init = 1;
+      for (i = 0; i < _NOTCHAR; ++i)
+	if (i == '\n')
+	  sbit[i] = 4;
+	else if (ISALNUM(i))
+	  sbit[i] = 2;
+	else
+	  sbit[i] = 1;
+    }
+
+  if (! r->tralloc)
+    build_state_zero(r);
+
+  s = 0;
+  p = (unsigned char *) begin;
+  trans = r->trans;
+  *end = '\n';
+
+  for (;;)
+    {
+      /* The dreaded inner loop. */
+      if (t = trans[s])
+	do
+	  {
+	    s1 = t[*p++];
+	    if (! (t = trans[s1]))
+	      goto last_was_s;
+	    s = t[*p++];
+	  }
+        while (t = trans[s]);
+      goto last_was_s1;
+    last_was_s:
+      tmp = s, s = s1, s1 = tmp;
+    last_was_s1:
+
+      if (s >= 0 && p <= (unsigned char *) end && r->fails[s])
+	{
+	  if (r->success[s] & sbit[*p])
+	    {
+	      if (backref)
+		if (r->states[s].backref)
+		  *backref = 1;
+		else
+		  *backref = 0;
+	      return (char *) p;
+	    }
+
+	  s1 = s;
+	  s = r->fails[s][*p++];
+	  continue;
+	}
+
+      /* If the previous character was a newline, count it. */
+      if (count && (char *) p <= end && p[-1] == '\n')
+	++*count;
+
+      /* Check if we've run off the end of the buffer. */
+      if ((char *) p >= end)
+	return NULL;
+
+      if (s >= 0)
+	{
+	  build_state(s, r);
+	  trans = r->trans;
+	  continue;
+	}
+
+      if (p[-1] == '\n' && newline)
+	{
+	  s = r->newlines[s1];
+	  continue;
+	}
+
+      s = 0;
+    }
+}
+
+/* Initialize the components of a regexp that the other routines don't
+   initialize for themselves. */
+void
+reginit(r)
+     struct regexp *r;
+{
+  r->calloc = 1;
+  MALLOC(r->charsets, _charset, r->calloc);
+  r->cindex = 0;
+
+  r->talloc = 1;
+  MALLOC(r->tokens, _token, r->talloc);
+  r->tindex = r->depth = r->nleaves = r->nregexps = 0;
+
+  r->searchflag = 0;
+  r->tralloc = 0;
+}
+
+/* Parse and analyze a single string of the given length. */
+void
+regcompile(s, len, r, searchflag)
+     const char *s;
+     size_t len;
+     struct regexp *r;
+     int searchflag;
+{
+  if (case_fold)	/* dummy folding in service of regmust() */
+    {
+      char *copy;
+      int i;
+
+      copy = malloc(len);
+      if (!copy)
+	regerror("out of memory");
+      
+      /* This is a complete kludge and could potentially break
+	 \<letter> escapes . . . */
+      case_fold = 0;
+      for (i = 0; i < len; ++i)
+	if (ISUPPER(s[i]))
+	  copy[i] = tolower(s[i]);
+	else
+	  copy[i] = s[i];
+
+      reginit(r);
+      r->mustn = 0;
+      r->must[0] = '\0';
+      regparse(copy, len, r);
+      free(copy);
+      regmust(r);
+      reganalyze(r, searchflag);
+      case_fold = 1;
+      reginit(r);
+      regparse(s, len, r);
+      reganalyze(r, searchflag);
+    }
+  else
+    {
+        reginit(r);
+        regparse(s, len, r);
+        regmust(r);
+        reganalyze(r, searchflag);
+    }
+}
+
+/* Free the storage held by the components of a regexp. */
+void
+regfree(r)
+     struct regexp *r;
+{
+  int i;
+
+  free((ptr_t) r->charsets);
+  free((ptr_t) r->tokens);
+  for (i = 0; i < r->sindex; ++i)
+    free((ptr_t) r->states[i].elems.elems);
+  free((ptr_t) r->states);
+  for (i = 0; i < r->tindex; ++i)
+    if (r->follows[i].elems)
+      free((ptr_t) r->follows[i].elems);
+  free((ptr_t) r->follows);
+  for (i = 0; i < r->tralloc; ++i)
+    if (r->trans[i])
+      free((ptr_t) r->trans[i]);
+    else if (r->fails[i])
+      free((ptr_t) r->fails[i]);
+  free((ptr_t) r->realtrans);
+  free((ptr_t) r->fails);
+  free((ptr_t) r->newlines);
+}
+
+/*
+Having found the postfix representation of the regular expression,
+try to find a long sequence of characters that must appear in any line
+containing the r.e.
+Finding a "longest" sequence is beyond the scope here;
+we take an easy way out and hope for the best.
+(Take "(ab|a)b"--please.)
+
+We do a bottom-up calculation of sequences of characters that must appear
+in matches of r.e.'s represented by trees rooted at the nodes of the postfix
+representation:
+	sequences that must appear at the left of the match ("left")
+	sequences that must appear at the right of the match ("right")
+	lists of sequences that must appear somewhere in the match ("in")
+	sequences that must constitute the match ("is")
+When we get to the root of the tree, we use one of the longest of its
+calculated "in" sequences as our answer.  The sequence we find is returned in
+r->must (where "r" is the single argument passed to "regmust");
+the length of the sequence is returned in r->mustn.
+
+The sequences calculated for the various types of node (in pseudo ANSI c)
+are shown below.  "p" is the operand of unary operators (and the left-hand
+operand of binary operators); "q" is the right-hand operand of binary operators
+.
+"ZERO" means "a zero-length sequence" below.
+
+Type	left		right		is		in
+----	----		-----		--		--
+char c	# c		# c		# c		# c
+
+SET	ZERO		ZERO		ZERO		ZERO
+
+STAR	ZERO		ZERO		ZERO		ZERO
+
+QMARK	ZERO		ZERO		ZERO		ZERO
+
+PLUS	p->left		p->right	ZERO		p->in
+
+CAT	(p->is==ZERO)?	(q->is==ZERO)?	(p->is!=ZERO &&	p->in plus
+	p->left :	q->right :	q->is!=ZERO) ?	q->in plus
+	p->is##q->left	p->right##q->is	p->is##q->is :	p->right##q->left
+					ZERO
+
+OR	longest common	longest common	(do p->is and	substrings common to
+	leading		trailing	q->is have same	p->in and q->in
+	(sub)sequence	(sub)sequence	length and	
+	of p->left	of p->right	content) ?	
+	and q->left	and q->right	p->is : NULL	
+
+If there's anything else we recognize in the tree, all four sequences get set
+to zero-length sequences.  If there's something we don't recognize in the tree,
+we just return a zero-length sequence.
+
+Break ties in favor of infrequent letters (choosing 'zzz' in preference to
+'aaa')?
+
+And. . .is it here or someplace that we might ponder "optimizations" such as
+	egrep 'psi|epsilon'	->	egrep 'psi'
+	egrep 'pepsi|epsilon'	->	egrep 'epsi'
+					(Yes, we now find "epsi" as a "string
+					that must occur", but we might also
+					simplify the *entire* r.e. being sought
+)
+	grep '[c]'		->	grep 'c'
+	grep '(ab|a)b'		->	grep 'ab'
+	grep 'ab*'		->	grep 'a'
+	grep 'a*b'		->	grep 'b'
+There are several issues:
+	Is optimization easy (enough)?
+
+	Does optimization actually accomplish anything,
+	or is the automaton you get from "psi|epsilon" (for example)
+	the same as the one you get from "psi" (for example)?
+
+	Are optimizable r.e.'s likely to be used in real-life situations
+	(something like 'ab*' is probably unlikely; something like is
+	'psi|epsilon' is likelier)?
+*/
+
+static char *
+icatalloc(old, new)
+char *	old;
+char *	new;
+{
+	register char *	result;
+	register int	oldsize, newsize;
+
+	newsize = (new == NULL) ? 0 : strlen(new);
+	if (old == NULL)
+		oldsize = 0;
+	else if (newsize == 0)
+		return old;
+	else	oldsize = strlen(old);
+	if (old == NULL)
+		result = (char *) malloc(newsize + 1);
+	else	result = (char *) realloc((void *) old, oldsize + newsize + 1);
+	if (result != NULL && new != NULL)
+		(void) strcpy(result + oldsize, new);
+	return result;
+}
+
+static char *
+icpyalloc(string)
+const char *	string;
+{
+	return icatalloc((char *) NULL, string);
+}
+
+static char *
+istrstr(lookin, lookfor)
+char *		lookin;
+register char *	lookfor;
+{
+	register char *	cp;
+	register int	len;
+
+	len = strlen(lookfor);
+	for (cp = lookin; *cp != '\0'; ++cp)
+		if (strncmp(cp, lookfor, len) == 0)
+			return cp;
+	return NULL;
+}
+
+static void
+ifree(cp)
+char *	cp;
+{
+	if (cp != NULL)
+		free(cp);
+}
+
+static void
+freelist(cpp)
+register char **	cpp;
+{
+	register int	i;
+
+	if (cpp == NULL)
+		return;
+	for (i = 0; cpp[i] != NULL; ++i) {
+		free(cpp[i]);
+		cpp[i] = NULL;
+	}
+}
+
+static char **
+enlist(cpp, new, len)
+register char **	cpp;
+register char *		new;
+int len;
+{
+	register int	i, j;
+
+	if (cpp == NULL)
+		return NULL;
+	if ((new = icpyalloc(new)) == NULL) {
+		freelist(cpp);
+		return NULL;
+	}
+	new[len] = '\0';
+	/*
+	** Is there already something in the list that's new (or longer)?
+	*/
+	for (i = 0; cpp[i] != NULL; ++i)
+		if (istrstr(cpp[i], new) != NULL) {
+			free(new);
+			return cpp;
+		}
+	/*
+	** Eliminate any obsoleted strings.
+	*/
+	j = 0;
+	while (cpp[j] != NULL)
+		if (istrstr(new, cpp[j]) == NULL)
+			++j;
+		else {
+			free(cpp[j]);
+			if (--i == j)
+				break;
+			cpp[j] = cpp[i];
+			cpp[i] = 0;
+		}
+	/*
+	** Add the new string.
+	*/
+	cpp = (char **) realloc((char *) cpp, (i + 2) * sizeof *cpp);
+	if (cpp == NULL)
+		return NULL;
+	cpp[i] = new;
+	cpp[i + 1] = NULL;
+	return cpp;
+}
+
+/*
+** Given pointers to two strings,
+** return a pointer to an allocated list of their distinct common substrings.
+** Return NULL if something seems wild.
+*/
+
+static char **
+comsubs(left, right)
+char *	left;
+char *	right;
+{
+	register char **	cpp;
+	register char *		lcp;
+	register char *		rcp;
+	register int		i, len;
+
+	if (left == NULL || right == NULL)
+		return NULL;
+	cpp = (char **) malloc(sizeof *cpp);
+	if (cpp == NULL)
+		return NULL;
+	cpp[0] = NULL;
+	for (lcp = left; *lcp != '\0'; ++lcp) {
+		len = 0;
+		rcp = index(right, *lcp);
+		while (rcp != NULL) {
+			for (i = 1; lcp[i] != '\0' && lcp[i] == rcp[i]; ++i)
+				;
+			if (i > len)
+				len = i;
+			rcp = index(rcp + 1, *lcp);
+		}
+		if (len == 0)
+			continue;
+		if ((cpp = enlist(cpp, lcp, len)) == NULL)
+			break;
+	}
+	return cpp;
+}
+
+static char **
+addlists(old, new)
+char **	old;
+char **	new;
+{
+	register int	i;
+
+	if (old == NULL || new == NULL)
+		return NULL;
+	for (i = 0; new[i] != NULL; ++i) {
+		old = enlist(old, new[i], strlen(new[i]));
+		if (old == NULL)
+			break;
+	}
+	return old;
+}
+
+/*
+** Given two lists of substrings,
+** return a new list giving substrings common to both.
+*/
+
+static char **
+inboth(left, right)
+char **	left;
+char **	right;
+{
+	register char **	both;
+	register char **	temp;
+	register int		lnum, rnum;
+
+	if (left == NULL || right == NULL)
+		return NULL;
+	both = (char **) malloc(sizeof *both);
+	if (both == NULL)
+		return NULL;
+	both[0] = NULL;
+	for (lnum = 0; left[lnum] != NULL; ++lnum) {
+		for (rnum = 0; right[rnum] != NULL; ++rnum) {
+			temp = comsubs(left[lnum], right[rnum]);
+			if (temp == NULL) {
+				freelist(both);
+				return NULL;
+			}
+			both = addlists(both, temp);
+			freelist(temp);
+			if (both == NULL)
+				return NULL;
+		}
+	}
+	return both;
+}
+
+typedef struct {
+	char **	in;
+	char *	left;
+	char *	right;
+	char *	is;
+} must;
+
+static void
+resetmust(mp)
+register must *	mp;
+{
+	mp->left[0] = mp->right[0] = mp->is[0] = '\0';
+	freelist(mp->in);
+}
+
+static void
+regmust(reg)
+register struct regexp *	reg;
+{
+	register must *		musts;
+	register must *		mp;
+	register char *		result;
+	register int		ri;
+	register int		i;
+	register _token		t;
+	static must		must0;
+
+	reg->mustn = 0;
+	reg->must[0] = '\0';
+	musts = (must *) malloc((reg->tindex + 1) * sizeof *musts);
+	if (musts == NULL)
+		return;
+	mp = musts;
+	for (i = 0; i <= reg->tindex; ++i)
+		mp[i] = must0;
+	for (i = 0; i <= reg->tindex; ++i) {
+		mp[i].in = (char **) malloc(sizeof *mp[i].in);
+		mp[i].left = malloc(2);
+		mp[i].right = malloc(2);
+		mp[i].is = malloc(2);
+		if (mp[i].in == NULL || mp[i].left == NULL ||
+			mp[i].right == NULL || mp[i].is == NULL)
+				goto done;
+		mp[i].left[0] = mp[i].right[0] = mp[i].is[0] = '\0';
+		mp[i].in[0] = NULL;
+	}
+	result = "";
+#ifdef DEBUG
+	fprintf(stderr, "regmust:\n");
+	for (i = 0; i < reg->tindex; ++i) {
+		fprintf(stderr, " %d:", i);
+		prtok(reg->tokens[i]);
+	}
+	putc('\n', stderr);
+#endif
+	for (ri = 0; ri < reg->tindex; ++ri) {
+		switch (t = reg->tokens[ri]) {
+		case _ALLBEGLINE:
+		case _ALLENDLINE:
+		case _LPAREN:
+		case _RPAREN:
+			goto done;		/* "cannot happen" */
+		case _EMPTY:
+		case _BEGLINE:
+		case _ENDLINE:
+		case _BEGWORD:
+		case _ENDWORD:
+		case _LIMWORD:
+		case _NOTLIMWORD:
+		case _BACKREF:
+			resetmust(mp);
+			break;
+		case _STAR:
+		case _QMARK:
+			if (mp <= musts)
+				goto done;	/* "cannot happen" */
+			--mp;
+			resetmust(mp);
+			break;
+		case _OR:
+			if (mp < &musts[2])
+				goto done;	/* "cannot happen" */
+			{
+				register char **	new;
+				register must *		lmp;
+				register must *		rmp;
+				register int		j, ln, rn, n;
+
+				rmp = --mp;
+				lmp = --mp;
+				/* Guaranteed to be.  Unlikely, but. . . */
+				if (strcmp(lmp->is, rmp->is) != 0)
+					lmp->is[0] = '\0';
+				/* Left side--easy */
+				i = 0;
+				while (lmp->left[i] != '\0' &&
+					lmp->left[i] == rmp->left[i])
+						++i;
+				lmp->left[i] = '\0';
+				/* Right side */
+				ln = strlen(lmp->right);
+				rn = strlen(rmp->right);
+				n = ln;
+				if (n > rn)
+					n = rn;
+				for (i = 0; i < n; ++i)
+					if (lmp->right[ln - i - 1] !=
+					    rmp->right[rn - i - 1])
+						break;
+				for (j = 0; j < i; ++j)
+					lmp->right[j] =
+						lmp->right[(ln - i) + j];
+				lmp->right[j] = '\0';
+				new = inboth(lmp->in, rmp->in);
+				if (new == NULL)
+					goto done;
+				freelist(lmp->in);
+				free((char *) lmp->in);
+				lmp->in = new;
+			}
+			break;
+		case _PLUS:
+			if (mp <= musts)
+				goto done;	/* "cannot happen" */
+			--mp;
+			mp->is[0] = '\0';
+			break;
+		case _END:
+			if (mp != &musts[1])
+				goto done;	/* "cannot happen" */
+			for (i = 0; musts[0].in[i] != NULL; ++i)
+				if (strlen(musts[0].in[i]) > strlen(result))
+					result = musts[0].in[i];
+			goto done;
+		case _CAT:
+			if (mp < &musts[2])
+				goto done;	/* "cannot happen" */
+			{
+				register must *	lmp;
+				register must *	rmp;
+
+				rmp = --mp;
+				lmp = --mp;
+				/*
+				** In.  Everything in left, plus everything in
+				** right, plus catenation of
+				** left's right and right's left.
+				*/
+				lmp->in = addlists(lmp->in, rmp->in);
+				if (lmp->in == NULL)
+					goto done;
+				if (lmp->right[0] != '\0' &&
+					rmp->left[0] != '\0') {
+						register char *	tp;
+
+						tp = icpyalloc(lmp->right);
+						if (tp == NULL)
+							goto done;
+						tp = icatalloc(tp, rmp->left);
+						if (tp == NULL)
+							goto done;
+						lmp->in = enlist(lmp->in, tp,
+							strlen(tp));
+						free(tp);
+						if (lmp->in == NULL)
+							goto done;
+				}
+				/* Left-hand */
+				if (lmp->is[0] != '\0') {
+					lmp->left = icatalloc(lmp->left,
+						rmp->left);
+					if (lmp->left == NULL)
+						goto done;
+				}
+				/* Right-hand */
+				if (rmp->is[0] == '\0')
+					lmp->right[0] = '\0';
+				lmp->right = icatalloc(lmp->right, rmp->right);
+				if (lmp->right == NULL)
+					goto done;
+				/* Guaranteed to be */
+				if (lmp->is[0] != '\0' && rmp->is[0] != '\0') {
+					lmp->is = icatalloc(lmp->is, rmp->is);
+					if (lmp->is == NULL)
+						goto done;
+				} else
+					lmp->is[0] = '\0';
+			}
+			break;
+		default:
+			if (t < _END) {
+				/* "cannot happen" */
+				goto done;
+			} else if (t == '\0') {
+				/* not on *my* shift */
+				goto done;
+			} else if (t >= _SET) {
+				/* easy enough */
+				resetmust(mp);
+			} else {
+				/* plain character */
+				resetmust(mp);
+				mp->is[0] = mp->left[0] = mp->right[0] = t;
+				mp->is[1] = mp->left[1] = mp->right[1] = '\0';
+				mp->in = enlist(mp->in, mp->is, 1);
+				if (mp->in == NULL)
+					goto done;
+			}
+			break;
+		}
+#ifdef DEBUG
+		fprintf(stderr, " node: %d:", ri);
+		prtok(reg->tokens[ri]);
+		fprintf(stderr, "\n  in:");
+		for (i = 0; mp->in[i]; ++i)
+			fprintf(stderr, " \"%s\"", mp->in[i]);
+		fprintf(stderr, "\n  is: \"%s\"\n", mp->is);
+		fprintf(stderr, "  left: \"%s\"\n", mp->left);
+		fprintf(stderr, "  right: \"%s\"\n", mp->right);
+#endif
+		++mp;
+	}
+done:
+	(void) strncpy(reg->must, result, MUST_MAX - 1);
+	reg->must[MUST_MAX - 1] = '\0';
+	reg->mustn = strlen(reg->must);
+	mp = musts;
+	for (i = 0; i <= reg->tindex; ++i) {
+		freelist(mp[i].in);
+		ifree((char *) mp[i].in);
+		ifree(mp[i].left);
+		ifree(mp[i].right);
+		ifree(mp[i].is);
+	}
+	free((char *) mp);
+}
diff --git a/gnu/usr.bin/grep/dfa.h b/gnu/usr.bin/grep/dfa.h
new file mode 100644
index 000000000000..33e4bf14195a
--- /dev/null
+++ b/gnu/usr.bin/grep/dfa.h
@@ -0,0 +1,450 @@
+/* dfa.h - declarations for GNU deterministic regexp compiler
+   Copyright (C) 1988 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
+
+/* Written June, 1988 by Mike Haertel */
+
+#ifdef STDC_HEADERS
+
+#include <stddef.h>
+#include <stdlib.h>
+
+#else /* !STDC_HEADERS */
+
+#define const
+#include <sys/types.h>		/* For size_t.  */
+extern char *calloc(), *malloc(), *realloc();
+extern void free();
+
+#ifndef NULL
+#define NULL 0
+#endif
+
+#endif /* ! STDC_HEADERS */
+
+#include <ctype.h>
+#ifndef isascii
+#define ISALNUM(c) isalnum(c)
+#define ISALPHA(c) isalpha(c)
+#define ISUPPER(c) isupper(c)
+#define ISLOWER(c) islower(c)
+#else
+#define ISALNUM(c) (isascii(c) && isalnum(c))
+#define ISALPHA(c) (isascii(c) && isalpha(c))
+#define ISUPPER(c) (isascii(c) && isupper(c))
+#define ISLOWER(c) (isascii(c) && islower(c))
+#endif
+
+/* 1 means plain parentheses serve as grouping, and backslash
+     parentheses are needed for literal searching.
+   0 means backslash-parentheses are grouping, and plain parentheses
+     are for literal searching.  */
+#define RE_NO_BK_PARENS 1
+
+/* 1 means plain | serves as the "or"-operator, and \| is a literal.
+   0 means \| serves as the "or"-operator, and | is a literal.  */
+#define RE_NO_BK_VBAR 2
+
+/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
+   1 means \+, \? are operators and plain +, ? are literals.  */
+#define RE_BK_PLUS_QM 4
+
+/* 1 means | binds tighter than ^ or $.
+   0 means the contrary.  */
+#define RE_TIGHT_VBAR 8
+
+/* 1 means treat \n as an _OR operator
+   0 means treat it as a normal character */
+#define RE_NEWLINE_OR 16
+
+/* 0 means that a special characters (such as *, ^, and $) always have
+     their special meaning regardless of the surrounding context.
+   1 means that special characters may act as normal characters in some
+     contexts.  Specifically, this applies to:
+	^ - only special at the beginning, or after ( or |
+	$ - only special at the end, or before ) or |
+	*, +, ? - only special when not after the beginning, (, or | */
+#define RE_CONTEXT_INDEP_OPS 32
+
+/* Now define combinations of bits for the standard possibilities.  */
+#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
+#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
+#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
+#define RE_SYNTAX_EMACS 0
+
+/* Number of bits in an unsigned char. */
+#define CHARBITS 8
+
+/* First integer value that is greater than any character code. */
+#define _NOTCHAR (1 << CHARBITS)
+
+/* INTBITS need not be exact, just a lower bound. */
+#define INTBITS (CHARBITS * sizeof (int))
+
+/* Number of ints required to hold a bit for every character. */
+#define _CHARSET_INTS ((_NOTCHAR + INTBITS - 1) / INTBITS)
+
+/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
+typedef int _charset[_CHARSET_INTS];
+
+/* The regexp is parsed into an array of tokens in postfix form.  Some tokens
+   are operators and others are terminal symbols.  Most (but not all) of these
+   codes are returned by the lexical analyzer. */
+#if __STDC__
+
+typedef enum
+{
+  _END = -1,			/* _END is a terminal symbol that matches the
+				   end of input; any value of _END or less in
+				   the parse tree is such a symbol.  Accepting
+				   states of the DFA are those that would have
+				   a transition on _END. */
+
+  /* Ordinary character values are terminal symbols that match themselves. */
+
+  _EMPTY = _NOTCHAR,		/* _EMPTY is a terminal symbol that matches
+				   the empty string. */
+
+  _BACKREF,			/* _BACKREF is generated by \<digit>; it
+				   it not completely handled.  If the scanner
+				   detects a transition on backref, it returns
+				   a kind of "semi-success" indicating that
+				   the match will have to be verified with
+				   a backtracking matcher. */
+
+  _BEGLINE,			/* _BEGLINE is a terminal symbol that matches
+				   the empty string if it is at the beginning
+				   of a line. */
+
+  _ALLBEGLINE,			/* _ALLBEGLINE is a terminal symbol that
+				   matches the empty string if it is at the
+				   beginning of a line; _ALLBEGLINE applies
+				   to the entire regexp and can only occur
+				   as the first token thereof.  _ALLBEGLINE
+				   never appears in the parse tree; a _BEGLINE
+				   is prepended with _CAT to the entire
+				   regexp instead. */
+
+  _ENDLINE,			/* _ENDLINE is a terminal symbol that matches
+				   the empty string if it is at the end of
+				   a line. */
+
+  _ALLENDLINE,			/* _ALLENDLINE is to _ENDLINE as _ALLBEGLINE
+				   is to _BEGLINE. */
+
+  _BEGWORD,			/* _BEGWORD is a terminal symbol that matches
+				   the empty string if it is at the beginning
+				   of a word. */
+
+  _ENDWORD,			/* _ENDWORD is a terminal symbol that matches
+				   the empty string if it is at the end of
+				   a word. */
+
+  _LIMWORD,			/* _LIMWORD is a terminal symbol that matches
+				   the empty string if it is at the beginning
+				   or the end of a word. */
+
+  _NOTLIMWORD,			/* _NOTLIMWORD is a terminal symbol that
+				   matches the empty string if it is not at
+				   the beginning or end of a word. */
+
+  _QMARK,			/* _QMARK is an operator of one argument that
+				   matches zero or one occurences of its
+				   argument. */
+
+  _STAR,			/* _STAR is an operator of one argument that
+				   matches the Kleene closure (zero or more
+				   occurrences) of its argument. */
+
+  _PLUS,			/* _PLUS is an operator of one argument that
+				   matches the positive closure (one or more
+				   occurrences) of its argument. */
+
+  _CAT,				/* _CAT is an operator of two arguments that
+				   matches the concatenation of its
+				   arguments.  _CAT is never returned by the
+				   lexical analyzer. */
+
+  _OR,				/* _OR is an operator of two arguments that
+				   matches either of its arguments. */
+
+  _LPAREN,			/* _LPAREN never appears in the parse tree,
+				   it is only a lexeme. */
+
+  _RPAREN,			/* _RPAREN never appears in the parse tree. */
+
+  _SET				/* _SET and (and any value greater) is a
+				   terminal symbol that matches any of a
+				   class of characters. */
+} _token;
+
+#else /* ! __STDC__ */
+
+typedef short _token;
+
+#define _END -1
+#define _EMPTY _NOTCHAR
+#define _BACKREF (_EMPTY + 1)
+#define _BEGLINE (_EMPTY + 2)
+#define _ALLBEGLINE (_EMPTY + 3)
+#define _ENDLINE (_EMPTY + 4)
+#define _ALLENDLINE (_EMPTY + 5)
+#define _BEGWORD (_EMPTY + 6)
+#define _ENDWORD (_EMPTY + 7)
+#define _LIMWORD (_EMPTY + 8)
+#define _NOTLIMWORD (_EMPTY + 9)
+#define _QMARK (_EMPTY + 10)
+#define _STAR (_EMPTY + 11)
+#define _PLUS (_EMPTY + 12)
+#define _CAT (_EMPTY + 13)
+#define _OR (_EMPTY + 14)
+#define _LPAREN (_EMPTY + 15)
+#define _RPAREN (_EMPTY + 16)
+#define _SET (_EMPTY + 17)
+
+#endif /* ! __STDC__ */
+
+/* Sets are stored in an array in the compiled regexp; the index of the
+   array corresponding to a given set token is given by _SET_INDEX(t). */
+#define _SET_INDEX(t) ((t) - _SET)
+
+/* Sometimes characters can only be matched depending on the surrounding
+   context.  Such context decisions depend on what the previous character
+   was, and the value of the current (lookahead) character.  Context
+   dependent constraints are encoded as 8 bit integers.  Each bit that
+   is set indicates that the constraint succeeds in the corresponding
+   context.
+
+   bit 7 - previous and current are newlines
+   bit 6 - previous was newline, current isn't
+   bit 5 - previous wasn't newline, current is
+   bit 4 - neither previous nor current is a newline
+   bit 3 - previous and current are word-constituents
+   bit 2 - previous was word-constituent, current isn't
+   bit 1 - previous wasn't word-constituent, current is
+   bit 0 - neither previous nor current is word-constituent
+
+   Word-constituent characters are those that satisfy isalnum().
+
+   The macro _SUCCEEDS_IN_CONTEXT determines whether a a given constraint
+   succeeds in a particular context.  Prevn is true if the previous character
+   was a newline, currn is true if the lookahead character is a newline.
+   Prevl and currl similarly depend upon whether the previous and current
+   characters are word-constituent letters. */
+#define _MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
+  ((constraint) & 1 << ((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4)
+#define _MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
+  ((constraint) & 1 << ((prevl) ? 2 : 0) + ((currl) ? 1 : 0))
+#define _SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
+  (_MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn)		     \
+   && _MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
+
+/* The following macros give information about what a constraint depends on. */
+#define _PREV_NEWLINE_DEPENDENT(constraint) \
+  (((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
+#define _PREV_LETTER_DEPENDENT(constraint) \
+  (((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
+
+/* Tokens that match the empty string subject to some constraint actually
+   work by applying that constraint to determine what may follow them,
+   taking into account what has gone before.  The following values are
+   the constraints corresponding to the special tokens previously defined. */
+#define _NO_CONSTRAINT 0xff
+#define _BEGLINE_CONSTRAINT 0xcf
+#define _ENDLINE_CONSTRAINT 0xaf
+#define _BEGWORD_CONSTRAINT 0xf2
+#define _ENDWORD_CONSTRAINT 0xf4
+#define _LIMWORD_CONSTRAINT 0xf6
+#define _NOTLIMWORD_CONSTRAINT 0xf9
+
+/* States of the recognizer correspond to sets of positions in the parse
+   tree, together with the constraints under which they may be matched.
+   So a position is encoded as an index into the parse tree together with
+   a constraint. */
+typedef struct
+{
+  unsigned index;		/* Index into the parse array. */
+  unsigned constraint;		/* Constraint for matching this position. */
+} _position;
+
+/* Sets of positions are stored as arrays. */
+typedef struct
+{
+  _position *elems;		/* Elements of this position set. */
+  int nelem;			/* Number of elements in this set. */
+} _position_set;
+
+/* A state of the regexp consists of a set of positions, some flags,
+   and the token value of the lowest-numbered position of the state that
+   contains an _END token. */
+typedef struct
+{
+  int hash;			/* Hash of the positions of this state. */
+  _position_set elems;		/* Positions this state could match. */
+  char newline;			/* True if previous state matched newline. */
+  char letter;			/* True if previous state matched a letter. */
+  char backref;			/* True if this state matches a \<digit>. */
+  unsigned char constraint;	/* Constraint for this state to accept. */
+  int first_end;		/* Token value of the first _END in elems. */
+} _dfa_state;
+
+/* If an r.e. is at most MUST_MAX characters long, we look for a string which
+   must appear in it; whatever's found is dropped into the struct reg. */
+
+#define MUST_MAX	50
+
+/* A compiled regular expression. */
+struct regexp
+{
+  /* Stuff built by the scanner. */
+  _charset *charsets;		/* Array of character sets for _SET tokens. */
+  int cindex;			/* Index for adding new charsets. */
+  int calloc;			/* Number of charsets currently allocated. */
+
+  /* Stuff built by the parser. */
+  _token *tokens;		/* Postfix parse array. */
+  int tindex;			/* Index for adding new tokens. */
+  int talloc;			/* Number of tokens currently allocated. */
+  int depth;			/* Depth required of an evaluation stack
+				   used for depth-first traversal of the
+				   parse tree. */
+  int nleaves;			/* Number of leaves on the parse tree. */
+  int nregexps;			/* Count of parallel regexps being built
+				   with regparse(). */
+
+  /* Stuff owned by the state builder. */
+  _dfa_state *states;		/* States of the regexp. */
+  int sindex;			/* Index for adding new states. */
+  int salloc;			/* Number of states currently allocated. */
+
+  /* Stuff built by the structure analyzer. */
+  _position_set *follows;	/* Array of follow sets, indexed by position
+				   index.  The follow of a position is the set
+				   of positions containing characters that
+				   could conceivably follow a character
+				   matching the given position in a string
+				   matching the regexp.  Allocated to the
+				   maximum possible position index. */
+  int searchflag;		/* True if we are supposed to build a searching
+				   as opposed to an exact matcher.  A searching
+				   matcher finds the first and shortest string
+				   matching a regexp anywhere in the buffer,
+				   whereas an exact matcher finds the longest
+				   string matching, but anchored to the
+				   beginning of the buffer. */
+
+  /* Stuff owned by the executor. */
+  int tralloc;			/* Number of transition tables that have
+				   slots so far. */
+  int trcount;			/* Number of transition tables that have
+				   actually been built. */
+  int **trans;			/* Transition tables for states that can
+				   never accept.  If the transitions for a
+				   state have not yet been computed, or the
+				   state could possibly accept, its entry in
+				   this table is NULL. */
+  int **realtrans;		/* Trans always points to realtrans + 1; this
+				   is so trans[-1] can contain NULL. */
+  int **fails;			/* Transition tables after failing to accept
+				   on a state that potentially could do so. */
+  int *success;			/* Table of acceptance conditions used in
+				   regexecute and computed in build_state. */
+  int *newlines;		/* Transitions on newlines.  The entry for a
+				   newline in any transition table is always
+				   -1 so we can count lines without wasting
+				   too many cycles.  The transition for a
+				   newline is stored separately and handled
+				   as a special case.  Newline is also used
+				   as a sentinel at the end of the buffer. */
+  char must[MUST_MAX];
+  int mustn;
+};
+
+/* Some macros for user access to regexp internals. */
+
+/* ACCEPTING returns true if s could possibly be an accepting state of r. */
+#define ACCEPTING(s, r) ((r).states[s].constraint)
+
+/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
+   specified context. */
+#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, reg) \
+  _SUCCEEDS_IN_CONTEXT((reg).states[state].constraint,		   \
+		       prevn, currn, prevl, currl)
+
+/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
+   regexps that a given state could accept.  Parallel regexps are numbered
+   starting at 1. */
+#define FIRST_MATCHING_REGEXP(state, reg) (-(reg).states[state].first_end)
+
+/* Entry points. */
+
+#if __STDC__
+
+/* Regsyntax() takes two arguments; the first sets the syntax bits described
+   earlier in this file, and the second sets the case-folding flag. */
+extern void regsyntax(int, int);
+
+/* Compile the given string of the given length into the given struct regexp.
+   Final argument is a flag specifying whether to build a searching or an
+   exact matcher. */
+extern void regcompile(const char *, size_t, struct regexp *, int);
+
+/* Execute the given struct regexp on the buffer of characters.  The
+   first char * points to the beginning, and the second points to the
+   first character after the end of the buffer, which must be a writable
+   place so a sentinel end-of-buffer marker can be stored there.  The
+   second-to-last argument is a flag telling whether to allow newlines to
+   be part of a string matching the regexp.  The next-to-last argument,
+   if non-NULL, points to a place to increment every time we see a
+   newline.  The final argument, if non-NULL, points to a flag that will
+   be set if further examination by a backtracking matcher is needed in
+   order to verify backreferencing; otherwise the flag will be cleared.
+   Returns NULL if no match is found, or a pointer to the first
+   character after the first & shortest matching string in the buffer. */
+extern char *regexecute(struct regexp *, char *, char *, int, int *, int *);
+
+/* Free the storage held by the components of a struct regexp. */
+extern void regfree(struct regexp *);
+
+/* Entry points for people who know what they're doing. */
+
+/* Initialize the components of a struct regexp. */
+extern void reginit(struct regexp *);
+
+/* Incrementally parse a string of given length into a struct regexp. */
+extern void regparse(const char *, size_t, struct regexp *);
+
+/* Analyze a parsed regexp; second argument tells whether to build a searching
+   or an exact matcher. */
+extern void reganalyze(struct regexp *, int);
+
+/* Compute, for each possible character, the transitions out of a given
+   state, storing them in an array of integers. */
+extern void regstate(int, struct regexp *, int []);
+
+/* Error handling. */
+
+/* Regerror() is called by the regexp routines whenever an error occurs.  It
+   takes a single argument, a NUL-terminated string describing the error.
+   The default regerror() prints the error message to stderr and exits.
+   The user can provide a different regfree() if so desired. */
+extern void regerror(const char *);
+
+#else /* ! __STDC__ */
+extern void regsyntax(), regcompile(), regfree(), reginit(), regparse();
+extern void reganalyze(), regstate(), regerror();
+extern char *regexecute();
+#endif /* ! __STDC__ */
diff --git a/gnu/usr.bin/grep/getopt.c b/gnu/usr.bin/grep/getopt.c
new file mode 100644
index 000000000000..0661bdf76289
--- /dev/null
+++ b/gnu/usr.bin/grep/getopt.c
@@ -0,0 +1,678 @@
+/* Getopt for GNU.
+   NOTE: getopt is now part of the C library, so if you don't know what
+   "Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu
+   before changing it!
+
+   Copyright (C) 1987, 88, 89, 90, 91, 1992 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
+
+/* AIX requires this to be the first thing in the file. */
+#ifdef __GNUC__
+#define alloca __builtin_alloca
+#else /* not __GNUC__ */
+#if defined(sparc) && !defined(USG) && !defined(SVR4) && !defined(__svr4__)
+#include <alloca.h>
+#else
+#ifdef _AIX
+ #pragma alloca
+#else
+char *alloca ();
+#endif
+#endif /* sparc */
+#endif /* not __GNUC__ */
+
+#ifdef	LIBC
+/* For when compiled as part of the GNU C library.  */
+#include <ansidecl.h>
+#endif
+
+#include <stdio.h>
+
+/* This needs to come after some library #include
+   to get __GNU_LIBRARY__ defined.  */
+#ifdef	__GNU_LIBRARY__
+#undef	alloca
+#include <stdlib.h>
+#include <string.h>
+#else	/* Not GNU C library.  */
+#define	__alloca	alloca
+#endif	/* GNU C library.  */
+
+
+#ifndef __STDC__
+#define const
+#endif
+
+/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a
+   long-named option.  Because this is not POSIX.2 compliant, it is
+   being phased out. */
+#define GETOPT_COMPAT
+
+/* This version of `getopt' appears to the caller like standard Unix `getopt'
+   but it behaves differently for the user, since it allows the user
+   to intersperse the options with the other arguments.
+
+   As `getopt' works, it permutes the elements of ARGV so that,
+   when it is done, all the options precede everything else.  Thus
+   all application programs are extended to handle flexible argument order.
+
+   Setting the environment variable POSIXLY_CORRECT disables permutation.
+   Then the behavior is completely standard.
+
+   GNU application programs can use a third alternative mode in which
+   they can distinguish the relative order of options and other arguments.  */
+
+#include "getopt.h"
+
+/* For communication from `getopt' to the caller.
+   When `getopt' finds an option that takes an argument,
+   the argument value is returned here.
+   Also, when `ordering' is RETURN_IN_ORDER,
+   each non-option ARGV-element is returned here.  */
+
+char *optarg = 0;
+
+/* Index in ARGV of the next element to be scanned.
+   This is used for communication to and from the caller
+   and for communication between successive calls to `getopt'.
+
+   On entry to `getopt', zero means this is the first call; initialize.
+
+   When `getopt' returns EOF, this is the index of the first of the
+   non-option elements that the caller should itself scan.
+
+   Otherwise, `optind' communicates from one call to the next
+   how much of ARGV has been scanned so far.  */
+
+int optind = 0;
+
+/* The next char to be scanned in the option-element
+   in which the last option character we returned was found.
+   This allows us to pick up the scan where we left off.
+
+   If this is zero, or a null string, it means resume the scan
+   by advancing to the next ARGV-element.  */
+
+static char *nextchar;
+
+/* Callers store zero here to inhibit the error message
+   for unrecognized options.  */
+
+int opterr = 1;
+
+/* Describe how to deal with options that follow non-option ARGV-elements.
+
+   If the caller did not specify anything,
+   the default is REQUIRE_ORDER if the environment variable
+   POSIXLY_CORRECT is defined, PERMUTE otherwise.
+
+   REQUIRE_ORDER means don't recognize them as options;
+   stop option processing when the first non-option is seen.
+   This is what Unix does.
+   This mode of operation is selected by either setting the environment
+   variable POSIXLY_CORRECT, or using `+' as the first character
+   of the list of option characters.
+
+   PERMUTE is the default.  We permute the contents of ARGV as we scan,
+   so that eventually all the non-options are at the end.  This allows options
+   to be given in any order, even with programs that were not written to
+   expect this.
+
+   RETURN_IN_ORDER is an option available to programs that were written
+   to expect options and other ARGV-elements in any order and that care about
+   the ordering of the two.  We describe each non-option ARGV-element
+   as if it were the argument of an option with character code 1.
+   Using `-' as the first character of the list of option characters
+   selects this mode of operation.
+
+   The special argument `--' forces an end of option-scanning regardless
+   of the value of `ordering'.  In the case of RETURN_IN_ORDER, only
+   `--' can cause `getopt' to return EOF with `optind' != ARGC.  */
+
+static enum
+{
+  REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER
+} ordering;
+
+#ifdef	__GNU_LIBRARY__
+#include <string.h>
+#define	my_index	strchr
+#define	my_bcopy(src, dst, n)	memcpy ((dst), (src), (n))
+#else
+
+/* Avoid depending on library functions or files
+   whose names are inconsistent.  */
+
+char *getenv ();
+
+static char *
+my_index (string, chr)
+     char *string;
+     int chr;
+{
+  while (*string)
+    {
+      if (*string == chr)
+	return string;
+      string++;
+    }
+  return 0;
+}
+
+static void
+my_bcopy (from, to, size)
+     char *from, *to;
+     int size;
+{
+  int i;
+  for (i = 0; i < size; i++)
+    to[i] = from[i];
+}
+#endif				/* GNU C library.  */
+
+/* Handle permutation of arguments.  */
+
+/* Describe the part of ARGV that contains non-options that have
+   been skipped.  `first_nonopt' is the index in ARGV of the first of them;
+   `last_nonopt' is the index after the last of them.  */
+
+static int first_nonopt;
+static int last_nonopt;
+
+/* Exchange two adjacent subsequences of ARGV.
+   One subsequence is elements [first_nonopt,last_nonopt)
+   which contains all the non-options that have been skipped so far.
+   The other is elements [last_nonopt,optind), which contains all
+   the options processed since those non-options were skipped.
+
+   `first_nonopt' and `last_nonopt' are relocated so that they describe
+   the new indices of the non-options in ARGV after they are moved.  */
+
+static void
+exchange (argv)
+     char **argv;
+{
+  int nonopts_size = (last_nonopt - first_nonopt) * sizeof (char *);
+  char **temp = (char **) __alloca (nonopts_size);
+
+  /* Interchange the two blocks of data in ARGV.  */
+
+  my_bcopy (&argv[first_nonopt], temp, nonopts_size);
+  my_bcopy (&argv[last_nonopt], &argv[first_nonopt],
+	    (optind - last_nonopt) * sizeof (char *));
+  my_bcopy (temp, &argv[first_nonopt + optind - last_nonopt], nonopts_size);
+
+  /* Update records for the slots the non-options now occupy.  */
+
+  first_nonopt += (optind - last_nonopt);
+  last_nonopt = optind;
+}
+
+/* Scan elements of ARGV (whose length is ARGC) for option characters
+   given in OPTSTRING.
+
+   If an element of ARGV starts with '-', and is not exactly "-" or "--",
+   then it is an option element.  The characters of this element
+   (aside from the initial '-') are option characters.  If `getopt'
+   is called repeatedly, it returns successively each of the option characters
+   from each of the option elements.
+
+   If `getopt' finds another option character, it returns that character,
+   updating `optind' and `nextchar' so that the next call to `getopt' can
+   resume the scan with the following option character or ARGV-element.
+
+   If there are no more option characters, `getopt' returns `EOF'.
+   Then `optind' is the index in ARGV of the first ARGV-element
+   that is not an option.  (The ARGV-elements have been permuted
+   so that those that are not options now come last.)
+
+   OPTSTRING is a string containing the legitimate option characters.
+   If an option character is seen that is not listed in OPTSTRING,
+   return '?' after printing an error message.  If you set `opterr' to
+   zero, the error message is suppressed but we still return '?'.
+
+   If a char in OPTSTRING is followed by a colon, that means it wants an arg,
+   so the following text in the same ARGV-element, or the text of the following
+   ARGV-element, is returned in `optarg'.  Two colons mean an option that
+   wants an optional arg; if there is text in the current ARGV-element,
+   it is returned in `optarg', otherwise `optarg' is set to zero.
+
+   If OPTSTRING starts with `-' or `+', it requests different methods of
+   handling the non-option ARGV-elements.
+   See the comments about RETURN_IN_ORDER and REQUIRE_ORDER, above.
+
+   Long-named options begin with `--' instead of `-'.
+   Their names may be abbreviated as long as the abbreviation is unique
+   or is an exact match for some defined option.  If they have an
+   argument, it follows the option name in the same ARGV-element, separated
+   from the option name by a `=', or else the in next ARGV-element.
+   When `getopt' finds a long-named option, it returns 0 if that option's
+   `flag' field is nonzero, the value of the option's `val' field
+   if the `flag' field is zero.
+
+   The elements of ARGV aren't really const, because we permute them.
+   But we pretend they're const in the prototype to be compatible
+   with other systems.
+
+   LONGOPTS is a vector of `struct option' terminated by an
+   element containing a name which is zero.
+
+   LONGIND returns the index in LONGOPT of the long-named option found.
+   It is only valid when a long-named option has been found by the most
+   recent call.
+
+   If LONG_ONLY is nonzero, '-' as well as '--' can introduce
+   long-named options.  */
+
+int
+_getopt_internal (argc, argv, optstring, longopts, longind, long_only)
+     int argc;
+     char *const *argv;
+     const char *optstring;
+     const struct option *longopts;
+     int *longind;
+     int long_only;
+{
+  int option_index;
+
+  optarg = 0;
+
+  /* Initialize the internal data when the first call is made.
+     Start processing options with ARGV-element 1 (since ARGV-element 0
+     is the program name); the sequence of previously skipped
+     non-option ARGV-elements is empty.  */
+
+  if (optind == 0)
+    {
+      first_nonopt = last_nonopt = optind = 1;
+
+      nextchar = NULL;
+
+      /* Determine how to handle the ordering of options and nonoptions.  */
+
+      if (optstring[0] == '-')
+	{
+	  ordering = RETURN_IN_ORDER;
+	  ++optstring;
+	}
+      else if (optstring[0] == '+')
+	{
+	  ordering = REQUIRE_ORDER;
+	  ++optstring;
+	}
+      else if (getenv ("POSIXLY_CORRECT") != NULL)
+	ordering = REQUIRE_ORDER;
+      else
+	ordering = PERMUTE;
+    }
+
+  if (nextchar == NULL || *nextchar == '\0')
+    {
+      if (ordering == PERMUTE)
+	{
+	  /* If we have just processed some options following some non-options,
+	     exchange them so that the options come first.  */
+
+	  if (first_nonopt != last_nonopt && last_nonopt != optind)
+	    exchange ((char **) argv);
+	  else if (last_nonopt != optind)
+	    first_nonopt = optind;
+
+	  /* Now skip any additional non-options
+	     and extend the range of non-options previously skipped.  */
+
+	  while (optind < argc
+		 && (argv[optind][0] != '-' || argv[optind][1] == '\0')
+#ifdef GETOPT_COMPAT
+		 && (longopts == NULL
+		     || argv[optind][0] != '+' || argv[optind][1] == '\0')
+#endif				/* GETOPT_COMPAT */
+		 )
+	    optind++;
+	  last_nonopt = optind;
+	}
+
+      /* Special ARGV-element `--' means premature end of options.
+	 Skip it like a null option,
+	 then exchange with previous non-options as if it were an option,
+	 then skip everything else like a non-option.  */
+
+      if (optind != argc && !strcmp (argv[optind], "--"))
+	{
+	  optind++;
+
+	  if (first_nonopt != last_nonopt && last_nonopt != optind)
+	    exchange ((char **) argv);
+	  else if (first_nonopt == last_nonopt)
+	    first_nonopt = optind;
+	  last_nonopt = argc;
+
+	  optind = argc;
+	}
+
+      /* If we have done all the ARGV-elements, stop the scan
+	 and back over any non-options that we skipped and permuted.  */
+
+      if (optind == argc)
+	{
+	  /* Set the next-arg-index to point at the non-options
+	     that we previously skipped, so the caller will digest them.  */
+	  if (first_nonopt != last_nonopt)
+	    optind = first_nonopt;
+	  return EOF;
+	}
+
+      /* If we have come to a non-option and did not permute it,
+	 either stop the scan or describe it to the caller and pass it by.  */
+
+      if ((argv[optind][0] != '-' || argv[optind][1] == '\0')
+#ifdef GETOPT_COMPAT
+	  && (longopts == NULL
+	      || argv[optind][0] != '+' || argv[optind][1] == '\0')
+#endif				/* GETOPT_COMPAT */
+	  )
+	{
+	  if (ordering == REQUIRE_ORDER)
+	    return EOF;
+	  optarg = argv[optind++];
+	  return 1;
+	}
+
+      /* We have found another option-ARGV-element.
+	 Start decoding its characters.  */
+
+      nextchar = (argv[optind] + 1
+		  + (longopts != NULL && argv[optind][1] == '-'));
+    }
+
+  if (longopts != NULL
+      && ((argv[optind][0] == '-'
+	   && (argv[optind][1] == '-' || long_only))
+#ifdef GETOPT_COMPAT
+	  || argv[optind][0] == '+'
+#endif				/* GETOPT_COMPAT */
+	  ))
+    {
+      const struct option *p;
+      char *s = nextchar;
+      int exact = 0;
+      int ambig = 0;
+      const struct option *pfound = NULL;
+      int indfound;
+
+      while (*s && *s != '=')
+	s++;
+
+      /* Test all options for either exact match or abbreviated matches.  */
+      for (p = longopts, option_index = 0; p->name;
+	   p++, option_index++)
+	if (!strncmp (p->name, nextchar, s - nextchar))
+	  {
+	    if (s - nextchar == strlen (p->name))
+	      {
+		/* Exact match found.  */
+		pfound = p;
+		indfound = option_index;
+		exact = 1;
+		break;
+	      }
+	    else if (pfound == NULL)
+	      {
+		/* First nonexact match found.  */
+		pfound = p;
+		indfound = option_index;
+	      }
+	    else
+	      /* Second nonexact match found.  */
+	      ambig = 1;
+	  }
+
+      if (ambig && !exact)
+	{
+	  if (opterr)
+	    fprintf (stderr, "%s: option `%s' is ambiguous\n",
+		     argv[0], argv[optind]);
+	  nextchar += strlen (nextchar);
+	  optind++;
+	  return '?';
+	}
+
+      if (pfound != NULL)
+	{
+	  option_index = indfound;
+	  optind++;
+	  if (*s)
+	    {
+	      /* Don't test has_arg with >, because some C compilers don't
+		 allow it to be used on enums. */
+	      if (pfound->has_arg)
+		optarg = s + 1;
+	      else
+		{
+		  if (opterr)
+		    {
+		      if (argv[optind - 1][1] == '-')
+			/* --option */
+			fprintf (stderr,
+				 "%s: option `--%s' doesn't allow an argument\n",
+				 argv[0], pfound->name);
+		      else
+			/* +option or -option */
+			fprintf (stderr,
+			     "%s: option `%c%s' doesn't allow an argument\n",
+			     argv[0], argv[optind - 1][0], pfound->name);
+		    }
+		  nextchar += strlen (nextchar);
+		  return '?';
+		}
+	    }
+	  else if (pfound->has_arg == 1)
+	    {
+	      if (optind < argc)
+		optarg = argv[optind++];
+	      else
+		{
+		  if (opterr)
+		    fprintf (stderr, "%s: option `%s' requires an argument\n",
+			     argv[0], argv[optind - 1]);
+		  nextchar += strlen (nextchar);
+		  return '?';
+		}
+	    }
+	  nextchar += strlen (nextchar);
+	  if (longind != NULL)
+	    *longind = option_index;
+	  if (pfound->flag)
+	    {
+	      *(pfound->flag) = pfound->val;
+	      return 0;
+	    }
+	  return pfound->val;
+	}
+      /* Can't find it as a long option.  If this is not getopt_long_only,
+	 or the option starts with '--' or is not a valid short
+	 option, then it's an error.
+	 Otherwise interpret it as a short option. */
+      if (!long_only || argv[optind][1] == '-'
+#ifdef GETOPT_COMPAT
+	  || argv[optind][0] == '+'
+#endif				/* GETOPT_COMPAT */
+	  || my_index (optstring, *nextchar) == NULL)
+	{
+	  if (opterr)
+	    {
+	      if (argv[optind][1] == '-')
+		/* --option */
+		fprintf (stderr, "%s: unrecognized option `--%s'\n",
+			 argv[0], nextchar);
+	      else
+		/* +option or -option */
+		fprintf (stderr, "%s: unrecognized option `%c%s'\n",
+			 argv[0], argv[optind][0], nextchar);
+	    }
+	  nextchar += strlen (nextchar);
+	  optind++;
+	  return '?';
+	}
+    }
+
+  /* Look at and handle the next option-character.  */
+
+  {
+    char c = *nextchar++;
+    char *temp = my_index (optstring, c);
+
+    /* Increment `optind' when we start to process its last character.  */
+    if (*nextchar == '\0')
+      optind++;
+
+    if (temp == NULL || c == ':')
+      {
+	if (opterr)
+	  {
+	    if (c < 040 || c >= 0177)
+	      fprintf (stderr, "%s: unrecognized option, character code 0%o\n",
+		       argv[0], c);
+	    else
+	      fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c);
+	  }
+	return '?';
+      }
+    if (temp[1] == ':')
+      {
+	if (temp[2] == ':')
+	  {
+	    /* This is an option that accepts an argument optionally.  */
+	    if (*nextchar != '\0')
+	      {
+		optarg = nextchar;
+		optind++;
+	      }
+	    else
+	      optarg = 0;
+	    nextchar = NULL;
+	  }
+	else
+	  {
+	    /* This is an option that requires an argument.  */
+	    if (*nextchar != 0)
+	      {
+		optarg = nextchar;
+		/* If we end this ARGV-element by taking the rest as an arg,
+		   we must advance to the next element now.  */
+		optind++;
+	      }
+	    else if (optind == argc)
+	      {
+		if (opterr)
+		  fprintf (stderr, "%s: option `-%c' requires an argument\n",
+			   argv[0], c);
+		c = '?';
+	      }
+	    else
+	      /* We already incremented `optind' once;
+		 increment it again when taking next ARGV-elt as argument.  */
+	      optarg = argv[optind++];
+	    nextchar = NULL;
+	  }
+      }
+    return c;
+  }
+}
+
+int
+getopt (argc, argv, optstring)
+     int argc;
+     char *const *argv;
+     const char *optstring;
+{
+  return _getopt_internal (argc, argv, optstring,
+			   (const struct option *) 0,
+			   (int *) 0,
+			   0);
+}
+
+#ifdef TEST
+
+/* Compile with -DTEST to make an executable for use in testing
+   the above definition of `getopt'.  */
+
+int
+main (argc, argv)
+     int argc;
+     char **argv;
+{
+  int c;
+  int digit_optind = 0;
+
+  while (1)
+    {
+      int this_option_optind = optind ? optind : 1;
+
+      c = getopt (argc, argv, "abc:d:0123456789");
+      if (c == EOF)
+	break;
+
+      switch (c)
+	{
+	case '0':
+	case '1':
+	case '2':
+	case '3':
+	case '4':
+	case '5':
+	case '6':
+	case '7':
+	case '8':
+	case '9':
+	  if (digit_optind != 0 && digit_optind != this_option_optind)
+	    printf ("digits occur in two different argv-elements.\n");
+	  digit_optind = this_option_optind;
+	  printf ("option %c\n", c);
+	  break;
+
+	case 'a':
+	  printf ("option a\n");
+	  break;
+
+	case 'b':
+	  printf ("option b\n");
+	  break;
+
+	case 'c':
+	  printf ("option c with value `%s'\n", optarg);
+	  break;
+
+	case '?':
+	  break;
+
+	default:
+	  printf ("?? getopt returned character code 0%o ??\n", c);
+	}
+    }
+
+  if (optind < argc)
+    {
+      printf ("non-option ARGV-elements: ");
+      while (optind < argc)
+	printf ("%s ", argv[optind++]);
+      printf ("\n");
+    }
+
+  exit (0);
+}
+
+#endif /* TEST */
diff --git a/gnu/usr.bin/grep/getopt.h b/gnu/usr.bin/grep/getopt.h
new file mode 100644
index 000000000000..f64de3116163
--- /dev/null
+++ b/gnu/usr.bin/grep/getopt.h
@@ -0,0 +1,113 @@
+/* Declarations for getopt.
+   Copyright (C) 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
+
+#ifndef _GETOPT_H
+#define _GETOPT_H 1
+
+/* For communication from `getopt' to the caller.
+   When `getopt' finds an option that takes an argument,
+   the argument value is returned here.
+   Also, when `ordering' is RETURN_IN_ORDER,
+   each non-option ARGV-element is returned here.  */
+
+extern char *optarg;
+
+/* Index in ARGV of the next element to be scanned.
+   This is used for communication to and from the caller
+   and for communication between successive calls to `getopt'.
+
+   On entry to `getopt', zero means this is the first call; initialize.
+
+   When `getopt' returns EOF, this is the index of the first of the
+   non-option elements that the caller should itself scan.
+
+   Otherwise, `optind' communicates from one call to the next
+   how much of ARGV has been scanned so far.  */
+
+extern int optind;
+
+/* Callers store zero here to inhibit the error message `getopt' prints
+   for unrecognized options.  */
+
+extern int opterr;
+
+/* Describe the long-named options requested by the application.
+   The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector
+   of `struct option' terminated by an element containing a name which is
+   zero.
+
+   The field `has_arg' is:
+   no_argument		(or 0) if the option does not take an argument,
+   required_argument	(or 1) if the option requires an argument,
+   optional_argument 	(or 2) if the option takes an optional argument.
+
+   If the field `flag' is not NULL, it points to a variable that is set
+   to the value given in the field `val' when the option is found, but
+   left unchanged if the option is not found.
+
+   To have a long-named option do something other than set an `int' to
+   a compiled-in constant, such as set a value from `optarg', set the
+   option's `flag' field to zero and its `val' field to a nonzero
+   value (the equivalent single-letter option character, if there is
+   one).  For long options that have a zero `flag' field, `getopt'
+   returns the contents of the `val' field.  */
+
+struct option
+{
+#if	__STDC__
+  const char *name;
+#else
+  char *name;
+#endif
+  /* has_arg can't be an enum because some compilers complain about
+     type mismatches in all the code that assumes it is an int.  */
+  int has_arg;
+  int *flag;
+  int val;
+};
+
+/* Names for the values of the `has_arg' field of `struct option'.  */
+
+enum _argtype
+{
+  no_argument,
+  required_argument,
+  optional_argument
+};
+
+#if __STDC__
+extern int getopt (int argc, char *const *argv, const char *shortopts);
+extern int getopt_long (int argc, char *const *argv, const char *shortopts,
+		        const struct option *longopts, int *longind);
+extern int getopt_long_only (int argc, char *const *argv,
+			     const char *shortopts,
+		             const struct option *longopts, int *longind);
+
+/* Internal only.  Users should not call this directly.  */
+extern int _getopt_internal (int argc, char *const *argv,
+			     const char *shortopts,
+		             const struct option *longopts, int *longind,
+			     int long_only);
+#else /* not __STDC__ */
+extern int getopt ();
+extern int getopt_long ();
+extern int getopt_long_only ();
+
+extern int _getopt_internal ();
+#endif /* not __STDC__ */
+
+#endif /* _GETOPT_H */
diff --git a/gnu/usr.bin/grep/grep.1 b/gnu/usr.bin/grep/grep.1
new file mode 100644
index 000000000000..57093b6403a3
--- /dev/null
+++ b/gnu/usr.bin/grep/grep.1
@@ -0,0 +1,234 @@
+.TH GREP 1 "1988 December 13" "GNU Project" \" -*- nroff -*-
+.UC 4
+.SH NAME
+grep, egrep \- print lines matching a regular expression
+.SH SYNOPSIS
+.B grep
+[
+.B \-CVbchilnsvwx
+] [
+.BI \- num
+] [
+.B \-AB
+.I num
+] [ [
+.B \-e
+]
+.I expr
+|
+.B \-f
+.I file
+] [
+.I "files ..."
+]
+.SH DESCRIPTION
+.I Grep
+searches the files listed in the arguments (or standard
+input if no files are given) for all lines that contain a match for
+the given
+.IR expr .
+If any lines match, they are printed.
+.PP
+Also, if any matches were found,
+.I grep
+exits with a status of 0, but if no matches were found it exits
+with a status of 1.  This is useful for building shell scripts that
+use
+.I grep
+as a condition for, for example, the
+.I if
+statement.
+.PP
+When invoked as
+.I egrep
+the syntax of the
+.I expr
+is slightly different; See below.
+.br
+.SH "REGULAR EXPRESSIONS"
+.RS 2.5i
+.ta 1i 2i
+.sp
+.ti -2.0i
+(grep)	(egrep)		(explanation)
+.sp
+.ti -2.0i
+\fIc\fP	\fIc\fP	a single (non-meta) character matches itself.
+.sp
+.ti -2.0i
+\&.	.	matches any single character except newline.
+.sp
+.ti -2.0i
+\\?	?	postfix operator; preceeding item is optional.
+.sp
+.ti -2.0i
+\(**	\(**	postfix operator; preceeding item 0 or
+more times.
+.sp
+.ti -2.0i
+\\+	+	postfix operator; preceeding item 1 or
+more times.
+.sp
+.ti -2.0i
+\\|	|	infix operator; matches either
+argument.
+.sp
+.ti -2.0i
+^	^	matches the empty string at the beginning of a line.
+.sp
+.ti -2.0i
+$	$	matches the empty string at the end of a line.
+.sp
+.ti -2.0i
+\\<	\\<	matches the empty string at the beginning of a word.
+.sp
+.ti -2.0i
+\\>	\\>	matches the empty string at the end of a word.
+.sp
+.ti -2.0i
+[\fIchars\fP]	[\fIchars\fP]	match any character in the given class; if the
+first character after [ is ^, match any character
+not in the given class; a range of characters may
+be specified by \fIfirst\-last\fP; for example, \\W
+(below) is equivalent to the class [^A\-Za\-z0\-9]
+.sp
+.ti -2.0i
+\\( \\)	( )	parentheses are used to override operator precedence.
+.sp
+.ti -2.0i
+\\\fIdigit\fP	\\\fIdigit\fP	\\\fIn\fP matches a repeat of the text
+matched earlier in the regexp by the subexpression inside the nth
+opening parenthesis.
+.sp
+.ti -2.0i
+\\	\\	any special character may be preceded
+by a backslash to match it literally.
+.sp
+.ti -2.0i
+(the following are for compatibility with GNU Emacs)
+.sp
+.ti -2.0i
+\\b	\\b	matches the empty string at the edge of a word.
+.sp
+.ti -2.0i
+\\B	\\B	matches the empty string if not at the edge of a word.
+.sp
+.ti -2.0i
+\\w	\\w	matches word-constituent characters (letters & digits).
+.sp
+.ti -2.0i
+\\W	\\W	matches characters that are not word-constituent.
+.RE
+.PP
+Operator precedence is (highest to lowest) ?, \(**, and +, concatenation,
+and finally |.  All other constructs are syntactically identical to
+normal characters.  For the truly interested, the file dfa.c describes
+(and implements) the exact grammar understood by the parser.
+.SH OPTIONS
+.TP
+.BI \-A " num"
+print <num> lines of context after every matching line
+.TP
+.BI \-B " num"
+print
+.I num
+lines of context before every matching line
+.TP
+.B \-C
+print 2 lines of context on each side of every match
+.TP
+.BI \- num
+print
+.I num
+lines of context on each side of every match
+.TP
+.B \-V
+print the version number on the diagnostic output
+.TP
+.B \-b
+print every match preceded by its byte offset
+.TP
+.B \-c
+print a total count of matching lines only
+.TP
+.BI \-e " expr"
+search for
+.IR expr ;
+useful if
+.I expr
+begins with \-
+.TP
+.BI \-f " file"
+search for the expression contained in
+.I file
+.TP
+.B \-h
+don't display filenames on matches
+.TP
+.B \-i
+ignore case difference when comparing strings
+.TP
+.B \-l
+list files containing matches only
+.TP
+.B \-n
+print each match preceded by its line number
+.TP
+.B \-s
+run silently producing no output except error messages
+.TP
+.B \-v
+print only lines that contain no matches for the <expr>
+.TP
+.B \-w
+print only lines where the match is a complete word
+.TP
+.B \-x
+print only lines where the match is a whole line
+.SH "SEE ALSO"
+emacs(1), ed(1), sh(1),
+.I "GNU Emacs Manual"
+.SH INCOMPATIBILITIES
+The following incompatibilities with UNIX
+.I grep
+exist:
+.PP
+.RS 0.5i
+The context-dependent meaning of \(** is not quite the same (grep only).
+.PP
+.B \-b
+prints a byte offset instead of a block offset.
+.PP
+The {\fIm,n\fP} construct of System V grep is not implemented.
+.PP
+.SH BUGS
+GNU \fIe?grep\fP has been thoroughly debugged and tested over a period
+of several years; we think it's a reliable beast or we wouldn't
+distribute it.  If by some fluke of the universe you discover a bug,
+send a detailed description (including options, regular expressions,
+and a copy of an input file that can reproduce it) to mike@ai.mit.edu.
+.PP
+.SH AUTHORS
+Mike Haertel wrote the deterministic regexp code and the bulk
+of the program.
+.PP
+James A. Woods is responsible for the hybridized search strategy
+of using Boyer-Moore-Gosper fixed-string search as a filter
+before calling the general regexp matcher.
+.PP
+Arthur David Olson contributed code that finds fixed strings for
+the aforementioned BMG search for a large class of regexps.
+.PP
+Richard Stallman wrote the backtracking regexp matcher that is used
+for \\\fIdigit\fP backreferences, as well as the GNU getopt.  The
+backtracking matcher was originally written for GNU Emacs.
+.PP
+D. A. Gwyn wrote the C alloca emulation that is provided so
+System V machines can run this program.  (Alloca is used only
+by RMS' backtracking matcher, and then only rarely, so there
+is no loss if your machine doesn't have a "real" alloca.)
+.PP
+Scott Anderson and Henry Spencer designed the regression tests
+used in the "regress" script.
+.PP
+Paul Placeway wrote the original version of this manual page.
diff --git a/gnu/usr.bin/grep/grep.c b/gnu/usr.bin/grep/grep.c
new file mode 100644
index 000000000000..1c45c45fa425
--- /dev/null
+++ b/gnu/usr.bin/grep/grep.c
@@ -0,0 +1,1007 @@
+/* grep - print lines matching an extended regular expression
+   Copyright (C) 1988 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
+
+/* Written June, 1988 by Mike Haertel
+   BMG speedups added July, 1988 by James A. Woods and Arthur David Olson  */
+
+#include <stdio.h>
+
+#if defined(USG) || defined(STDC_HEADERS)
+#include <string.h>
+#ifndef bcopy
+#define bcopy(s,d,n)	memcpy((d),(s),(n))
+#endif
+#ifndef index
+#define index strchr
+#endif
+#else
+#include <strings.h>
+#endif
+
+#ifdef HAVE_UNISTD_H
+#include <unistd.h>
+#endif
+
+#ifndef STDC_HEADERS
+extern char *getenv();
+#endif
+extern int errno;
+
+extern char *sys_errlist[];
+
+#include "dfa.h"
+#include "regex.h"
+#include "getopt.h"
+
+/* Used by -w */
+#define WCHAR(C) (ISALNUM(C) || (C) == '_')
+
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* Exit status codes. */
+#define MATCHES_FOUND 0		/* Exit 0 if no errors and matches found. */
+#define NO_MATCHES_FOUND 1	/* Exit 1 if no matches were found. */
+#define ERROR 2			/* Exit 2 if some error occurred. */
+
+/* Error is set true if something awful happened. */
+static int error;
+
+/* The program name for error messages. */
+static char *prog;
+
+/* We do all our own buffering by hand for efficiency. */
+static char *buffer;		/* The buffer itself, grown as needed. */
+static bufbytes;		/* Number of bytes in the buffer. */
+static size_t bufalloc;		/* Number of bytes allocated to the buffer. */
+static bufprev;			/* Number of bytes that have been forgotten.
+				   This is used to get byte offsets from the
+				   beginning of the file. */
+static bufread;			/* Number of bytes to get with each read(). */
+
+static void
+initialize_buffer()
+{
+  bufread = 8192;
+  bufalloc = bufread + bufread / 2;
+  buffer = malloc(bufalloc);
+  if (! buffer)
+    {
+      fprintf(stderr, "%s: Memory exhausted (%s)\n", prog,
+	      sys_errlist[errno]);
+      exit(ERROR);
+    }
+}
+
+/* The current input file. */
+static fd;
+static char *filename;
+static eof;
+
+/* Fill the buffer retaining the last n bytes at the beginning of the
+   newly filled buffer (for backward context).  Returns the number of new
+   bytes read from disk. */
+static int
+fill_buffer_retaining(n)
+     int n;
+{
+  char *p, *q;
+  int i;
+
+  /* See if we need to grow the buffer. */
+  if (bufalloc - n <= bufread)
+    {
+      while (bufalloc - n <= bufread)
+	{
+	  bufalloc *= 2;
+	  bufread *= 2;
+	}
+      buffer = realloc(buffer, bufalloc);
+      if (! buffer)
+	{
+	  fprintf(stderr, "%s: Memory exhausted (%s)\n", prog,
+		  sys_errlist[errno]);
+	  exit(ERROR);
+	}
+    }
+
+  bufprev += bufbytes - n;
+
+  /* Shift stuff down. */
+  for (i = n, p = buffer, q = p + bufbytes - n; i--; )
+    *p++ = *q++;
+  bufbytes = n;
+
+  if (eof)
+    return 0;
+
+  /* Read in new stuff. */
+  i = read(fd, buffer + bufbytes, bufread);
+  if (i < 0)
+    {
+      fprintf(stderr, "%s: read on %s failed (%s)\n", prog,
+	      filename ? filename : "<stdin>", sys_errlist[errno]);
+      error = 1;
+    }
+
+  /* Kludge to pretend every nonempty file ends with a newline. */
+  if (i == 0 && bufbytes > 0 && buffer[bufbytes - 1] != '\n')
+    {
+      eof = i = 1;
+      buffer[bufbytes] = '\n';
+    }
+
+  bufbytes += i;
+  return i;
+}
+
+/* Various flags set according to the argument switches. */
+static trailing_context;	/* Lines of context to show after matches. */
+static leading_context;		/* Lines of context to show before matches. */
+static byte_count;		/* Precede output lines the byte count of the
+				   first character on the line. */
+static no_filenames;		/* Do not display filenames. */
+static line_numbers;		/* Precede output lines with line numbers. */
+static silent;			/* Produce no output at all.  This switch
+				   is bogus, ever hear of /dev/null? */
+static int whole_word;		/* Match only whole words.  Note that if
+				   backreferences are used this depends on
+				   the regex routines getting leftmost-longest
+				   right, which they don't right now if |
+				   is also used. */
+static int whole_line;		/* Match only whole lines.  Backreference
+				   caveat applies here too.  */
+static nonmatching_lines;	/* Print lines that don't match the regexp. */
+
+static bmgexec;			/* Invoke Boyer-Moore-Gosper routines */
+
+/* The compiled regular expression lives here. */
+static struct regexp reg;
+
+/* The compiled regular expression for the backtracking matcher lives here. */
+static struct re_pattern_buffer regex;
+
+/* Pointer in the buffer after the last character printed. */
+static char *printed_limit;
+
+/* True when printed_limit has been artifically advanced without printing
+   anything. */
+static int printed_limit_fake;
+
+/* Print a line at the given line number, returning the number of
+   characters actually printed.  Matching is true if the line is to
+   be considered a "matching line".  This is only meaningful if
+   surrounding context is turned on. */
+static int
+print_line(p, number, matching)
+     char *p;
+     int number;
+     int matching;
+{
+  int count = 0;
+
+  if (silent)
+    {
+      do
+	++count;
+      while (*p++ != '\n');
+      printed_limit_fake = 0;
+      printed_limit = p;
+      return count;
+    }
+
+  if (filename && !no_filenames)
+    printf("%s%c", filename, matching ? ':' : '-');
+  if (byte_count)
+    printf("%d%c", p - buffer + bufprev, matching ? ':' : '-');
+  if (line_numbers)
+    printf("%d%c", number, matching ? ':' : '-');
+  do
+    {
+      ++count;
+      putchar(*p);
+    }
+  while (*p++ != '\n');
+  printed_limit_fake = 0;
+  printed_limit = p;
+  return count;
+}
+
+/* Print matching or nonmatching lines from the current file.  Returns a
+   count of matching or nonmatching lines. */
+static int
+grep()
+{
+  int retain = 0;		/* Number of bytes to retain on next call
+				   to fill_buffer_retaining(). */
+  char *search_limit;		/* Pointer to the character after the last
+				   newline in the buffer. */
+  char saved_char;		/* Character after the last newline. */
+  char *resume;			/* Pointer to where to resume search. */
+  int resume_index = 0;		/* Count of characters to ignore after
+				   refilling the buffer. */
+  int line_count = 1;		/* Line number. */
+  int try_backref;		/* Set to true if we need to verify the
+				   match with a backtracking matcher. */
+  int initial_line_count;	/* Line count at beginning of last search. */
+  char *match;			/* Pointer to the first character after the
+				   string matching the regexp. */
+  int match_count = 0;		/* Count of matching lines. */
+  char *matching_line;		/* Pointer to first character of the matching
+				   line, or of the first line of context to
+				   print if context is turned on. */
+  char *real_matching_line;	/* Pointer to the first character of the
+				   real matching line. */
+  char *next_line;		/* Pointer to first character of the line
+				   following the matching line. */
+  char *last_match_limit;	/* Pointer after last matched line. */
+  int pending_lines = 0;	/* Lines of context left over from last match
+				   that we have to print. */
+  static first_match = 1;	/* True when nothing has been printed. */
+  int i;
+  char *tmp;
+  char *execute();
+
+  printed_limit_fake = 0;
+  
+  while (fill_buffer_retaining(retain) > 0)
+    {
+      /* Find the last newline in the buffer. */
+      search_limit = buffer + bufbytes;
+      while (search_limit > buffer && search_limit[-1] != '\n')
+	--search_limit;
+      if (search_limit == buffer)
+	{
+	  retain = bufbytes;
+	  continue;
+	}
+
+      /* Save the character after the last newline so regexecute can write
+	 its own sentinel newline. */
+      saved_char = *search_limit;
+
+      /* Search the buffer for a match. */
+      printed_limit = buffer;
+      resume = buffer + resume_index;
+      last_match_limit = resume;
+      initial_line_count = line_count;
+
+
+      /* In retrospect, I have to say that the following code sucks.
+	 For an example of how to do this right, see the fgrep
+	 driver program that I wrote around a year later.  I'm
+	 too lazy to retrofit that to egrep right now (the
+	 pattern matchers have different needs).  */
+
+
+      while (match = execute(&reg, resume, search_limit, 0, &line_count, &try_backref))
+	{
+	  /* Find the beginning of the matching line. */
+	  matching_line = match;
+	  while (matching_line > resume && matching_line[-1] != '\n')
+	    --matching_line;
+	  real_matching_line = matching_line;
+
+	  /* Find the beginning of the next line. */
+	  next_line = match;
+	  while (next_line < search_limit && *next_line++ != '\n')
+	    ;
+
+	  /* If a potential backreference is indicated, try it out with
+	     a backtracking matcher to make sure the line is a match.
+	     This is hairy because we need to handle whole_line and
+	     whole_word matches specially.  The method was stolen from
+	     GNU fgrep.  */
+	  if (try_backref)
+	    {
+	      struct re_registers regs;
+	      int beg, len, maxlen, ret;
+
+	      beg = 0;
+	      for (maxlen = next_line - matching_line - 1; beg <= maxlen; ++beg)
+		{
+		  /* See if the matching line matches when backreferences
+		     are considered... */
+		  ret = re_search (&regex, matching_line, maxlen,
+				   beg, maxlen - beg, &regs);
+		  if (ret == -1)
+		    goto fail;
+		  beg = ret;
+		  len = regs.end[0] - beg;
+		  /* Ok, now check if it subsumed the whole line if -x */
+		  if (whole_line && (beg != 0 || len != maxlen))
+		    goto fail;
+		  /* If -w then check if the match aligns with word
+		     boundaries.  We have to do this iteratively, because
+		     (a) The line may contain more than one occurence
+		     of the pattern, and;
+		     (b) Several alternatives in the pattern might
+		     be valid at a given point, and we may need to
+		     consider a shorter one in order to align with
+		     word boundaries.  */
+		  else if (whole_word)
+		    while (len > 0)
+		      {
+			/* If it's preceeded by a word constituent, then no go.  */
+			if (beg > 0
+			    && WCHAR((unsigned char) matching_line[beg - 1]))
+			  break;
+			/* If it's followed by a word constituent, look for
+			   a shorter match.  */
+			else if (beg + len < maxlen
+			    && WCHAR((unsigned char) matching_line[beg + len]))
+			  /* This is sheer incest. */
+			  len = re_match_2 (&regex, (unsigned char *) 0, 0,
+					    matching_line, maxlen,
+					    beg, &regs, beg + len - 1);
+			else
+			  goto succeed;
+		      }
+		  else
+		    goto succeed;
+		}
+	    fail:
+	      resume = next_line;
+	      if (resume == search_limit)
+		break;
+	      else
+		continue;
+	    }
+
+	succeed:
+	  /* Print out the matching or nonmatching lines as necessary. */
+	  if (! nonmatching_lines)
+	    {
+	      /* Not -v, so nothing hairy... */
+	      ++match_count;
+
+	      /* Print leftover trailing context from last time around.  */
+	      while (pending_lines && last_match_limit < matching_line)
+		{
+		  last_match_limit += print_line(last_match_limit,
+						 initial_line_count++,
+						 0);
+		  --pending_lines;
+		}
+
+	      /* Back up over leading context if necessary. */
+	      for (i = leading_context;
+		   i > 0 && matching_line > printed_limit;
+		   --i)
+		{
+		  while (matching_line > printed_limit
+			 && (--matching_line)[-1] != '\n')
+		    ;
+		  --line_count;
+		}
+
+	      /* If context is enabled, we may have to print a separator. */
+	      if ((leading_context || trailing_context) && !silent
+		  && !first_match && (printed_limit_fake
+				      || matching_line > printed_limit))
+		printf("----------\n");
+	      first_match = 0;
+
+	      /* Print the matching line and its leading context. */
+	      while (matching_line < real_matching_line)
+		matching_line += print_line(matching_line, line_count++, 0);
+	      matching_line += print_line(matching_line, line_count++, 1);
+
+	      /* If there's trailing context, leave some lines pending until
+		 next time. */
+	      pending_lines = trailing_context;
+	    }
+	  else if (matching_line == last_match_limit)
+	    {
+	      /* In the -v case, this is where we deal with leftover
+		 trailing context from last time... */
+	      if (pending_lines > 0)
+		{
+		  --pending_lines;
+		  print_line(matching_line, line_count, 0);
+		}
+	      ++line_count;
+	    }
+	  else if (matching_line > last_match_limit)
+	    {
+	      char *start = last_match_limit;
+
+	      /* Back up over leading context if necessary. */
+	      for (i = leading_context; start > printed_limit && i; --i)
+		{
+		  while (start > printed_limit && (--start)[-1] != '\n')
+		    ;
+		  --initial_line_count;
+		}
+
+	      /* If context is enabled, we may have to print a separator. */
+	      if ((leading_context || trailing_context) && !silent
+		  && !first_match && (printed_limit_fake
+				      || start > printed_limit))
+		printf("----------\n");
+	      first_match = 0;
+
+	      /* Print out the presumably matching leading context. */
+	      while (start < last_match_limit)
+		start += print_line(start, initial_line_count++, 0);
+
+	      /* Print out the nonmatching lines prior to the matching line. */
+	      while (start < matching_line)
+		{
+		  /* This counts as a "matching line" in -v. */
+		  ++match_count;
+		  start += print_line(start, initial_line_count++, 1);
+		}
+
+	      /* Deal with trailing context.  In -v what this means is
+		 we print the current (matching) line, marked as a non
+		 matching line.  */
+	      if (trailing_context)
+		{
+		  print_line(matching_line, line_count, 0);
+		  pending_lines = trailing_context - 1;
+		}
+
+	      /* Count the current line. */
+	      ++line_count;
+	    }
+	  else
+	    /* Let us pray this never happens... */
+	    abort();
+
+	  /* Resume searching at the beginning of the next line. */
+	  initial_line_count = line_count;
+	  resume = next_line;
+	  last_match_limit = next_line;
+
+	  if (resume == search_limit)
+	    break;
+	}
+ 
+      /* Restore the saved character. */
+      *search_limit = saved_char;
+
+      if (! nonmatching_lines)
+	{
+	  while (last_match_limit < search_limit && pending_lines)
+	    {
+	      last_match_limit += print_line(last_match_limit,
+					     initial_line_count++,
+					     0);
+	      --pending_lines;
+	    }
+	}
+      else if (search_limit > last_match_limit)
+	{
+	  char *start = last_match_limit;
+
+	  /* Back up over leading context if necessary. */
+	  for (i = leading_context; start > printed_limit && i; --i)
+	    {
+	      while (start > printed_limit && (--start)[-1] != '\n')
+		;
+	      --initial_line_count;
+	    }
+
+	  /* If context is enabled, we may have to print a separator. */
+	  if ((leading_context || trailing_context) && !silent
+	      && !first_match && (printed_limit_fake
+				  || start > printed_limit))
+	    printf("----------\n");
+	  first_match = 0;
+
+	  /* Print out all the nonmatching lines up to the search limit. */
+	  while (start < last_match_limit)
+	    start += print_line(start, initial_line_count++, 0);
+	  while (start < search_limit)
+	    {
+	      ++match_count;
+	      start += print_line(start, initial_line_count++, 1);
+	    }
+
+	  pending_lines = trailing_context;
+	  resume_index = 0;
+	  retain = bufbytes - (search_limit - buffer);
+	  continue;
+	}
+      
+      /* Save the trailing end of the buffer for possible use as leading
+	 context in the future. */
+      i = leading_context;
+      tmp = search_limit;
+      while (tmp > printed_limit && i--)
+	while (tmp > printed_limit && (--tmp)[-1] != '\n')
+	  ;
+      resume_index = search_limit - tmp;
+      retain = bufbytes - (tmp - buffer);
+      if (tmp > printed_limit)
+	printed_limit_fake = 1;
+    }
+
+  return match_count;
+}
+
+void
+usage_and_die()
+{
+  fprintf(stderr, "\
+Usage: %s [-CVbchilnsvwx] [-num] [-A num] [-B num] [-f file]\n\
+       [-e] expr [file...]\n", prog);
+  exit(ERROR);
+}
+
+static char version[] = "GNU e?grep, version 1.6";
+
+int
+main(argc, argv)
+     int argc;
+     char **argv;
+{
+  int c;
+  int ignore_case = 0;		/* Compile the regexp to ignore case. */
+  char *the_regexp = 0;		/* The regular expression. */
+  int regexp_len;		/* Length of the regular expression. */
+  char *regexp_file = 0;	/* File containing parallel regexps. */
+  int count_lines = 0;		/* Display only a count of matching lines. */
+  int list_files = 0;		/* Display only the names of matching files. */
+  int line_count = 0;		/* Count of matching lines for a file. */
+  int matches_found = 0;	/* True if matches were found. */
+  char *regex_errmesg;		/* Error message from regex routines. */
+  char translate[_NOTCHAR];	/* Translate table for case conversion
+				   (needed by the backtracking matcher). */
+
+  if (prog = index(argv[0], '/'))
+    ++prog;
+  else
+    prog = argv[0];
+
+  opterr = 0;
+  while ((c = getopt(argc, argv, "0123456789A:B:CVbce:f:hilnsvwx")) != EOF)
+    switch (c)
+      {
+      case '?':
+	usage_and_die();
+	break;
+
+      case '0':
+      case '1':
+      case '2':
+      case '3':
+      case '4':
+      case '5':
+      case '6':
+      case '7':
+      case '8':
+      case '9':
+	trailing_context = 10 * trailing_context + c - '0';
+	leading_context = 10 * leading_context + c - '0';
+	break;
+
+      case 'A':
+	if (! sscanf(optarg, "%d", &trailing_context)
+	    || trailing_context < 0)
+	  usage_and_die();
+	break;
+
+      case 'B':
+	if (! sscanf(optarg, "%d", &leading_context)
+	    || leading_context < 0)
+	  usage_and_die();
+	break;
+
+      case 'C':
+	trailing_context = leading_context = 2;
+	break;
+
+      case 'V':
+	fprintf(stderr, "%s\n", version);
+	break;
+
+      case 'b':
+	byte_count = 1;
+	break;
+
+      case 'c':
+	count_lines = 1;
+	silent = 1;
+	break;
+
+      case 'e':
+	/* It doesn't make sense to mix -f and -e. */
+	if (regexp_file)
+	  usage_and_die();
+	the_regexp = optarg;
+	break;
+
+      case 'f':
+	/* It doesn't make sense to mix -f and -e. */
+	if (the_regexp)
+	  usage_and_die();
+	regexp_file = optarg;
+	break;
+
+      case 'h':
+	no_filenames = 1;
+	break;
+
+      case 'i':
+	ignore_case = 1;
+	for (c = 0; c < _NOTCHAR; ++c)
+	  if (isupper(c))
+	    translate[c] = tolower(c);
+	  else
+	    translate[c] = c;
+	regex.translate = translate;
+	break;
+
+      case 'l':
+	list_files = 1;
+	silent = 1;
+	break;
+
+      case 'n':
+	line_numbers = 1;
+	break;
+
+      case 's':
+	silent = 1;
+	break;
+
+      case 'v':
+	nonmatching_lines = 1;
+	break;
+
+      case 'w':
+	whole_word = 1;
+	break;
+
+      case 'x':
+	whole_line = 1;
+	break;
+
+      default:
+	/* This can't happen. */
+	fprintf(stderr, "%s: getopt(3) let one by!\n", prog);
+	usage_and_die();
+	break;
+      }
+
+  /* Set the syntax depending on whether we are EGREP or not. */
+#ifdef EGREP
+  regsyntax(RE_SYNTAX_EGREP, ignore_case);
+  re_set_syntax(RE_SYNTAX_EGREP);
+#else
+  regsyntax(RE_SYNTAX_GREP, ignore_case);
+  re_set_syntax(RE_SYNTAX_GREP);
+#endif
+
+  /* Compile the regexp according to all the options. */
+  if (regexp_file)
+    {
+      FILE *fp = fopen(regexp_file, "r");
+      int len = 256;
+      int i = 0;
+
+      if (! fp)
+	{
+	  fprintf(stderr, "%s: %s: %s\n", prog, regexp_file,
+		  sys_errlist[errno]);
+	  exit(ERROR);
+	}
+
+      the_regexp = malloc(len);
+      while ((c = getc(fp)) != EOF)
+	{
+	  the_regexp[i++] = c;
+	  if (i == len)
+	    the_regexp = realloc(the_regexp, len *= 2);
+	}
+      fclose(fp);
+      /* Nuke the concluding newline so we won't match the empty string. */
+      if (i > 0 && the_regexp[i - 1] == '\n')
+	--i;
+      regexp_len = i;
+    }
+  else if (! the_regexp)
+    {
+      if (optind >= argc)
+	usage_and_die();
+      the_regexp = argv[optind++];
+      regexp_len = strlen(the_regexp);
+    }
+  else
+    regexp_len = strlen(the_regexp);
+  
+  if (whole_word || whole_line)
+    {
+      /* In the whole-word case, we use the pattern:
+	 (^|[^A-Za-z_])(userpattern)([^A-Za-z_]|$).
+	 In the whole-line case, we use the pattern:
+	 ^(userpattern)$.
+	 BUG: Using [A-Za-z_] is locale-dependent!  */
+
+      char *n = malloc(regexp_len + 50);
+      int i = 0;
+
+#ifdef EGREP
+      if (whole_word)
+	strcpy(n, "(^|[^A-Za-z_])(");
+      else
+	strcpy(n, "^(");
+#else
+      /* Todo:  Make *sure* this is the right syntax.  Down with grep! */
+      if (whole_word)
+	strcpy(n, "\\(^\\|[^A-Za-z_]\\)\\(");
+      else
+	strcpy(n, "^\\(");
+#endif
+      i = strlen(n);
+      bcopy(the_regexp, n + i, regexp_len);
+      i += regexp_len;
+#ifdef EGREP
+      if (whole_word)
+	strcpy(n + i, ")([^A-Za-z_]|$)");
+      else
+	strcpy(n + i, ")$");
+#else
+      if (whole_word)
+	strcpy(n + i, "\\)\\([^A-Za-z_]\\|$\\)");
+      else
+	strcpy(n + i, "\\)$");
+#endif
+      i += strlen(n + i);
+      regcompile(n, i, &reg, 1);
+    }
+  else
+    regcompile(the_regexp, regexp_len, &reg, 1);
+
+  
+  if (regex_errmesg = re_compile_pattern(the_regexp, regexp_len, &regex))
+    regerror(regex_errmesg);
+  
+  /*
+    Find the longest metacharacter-free string which must occur in the
+    regexpr, before short-circuiting regexecute() with Boyer-Moore-Gosper.
+    (Conjecture:  The problem in general is NP-complete.)  If there is no
+    such string (like for many alternations), then default to full automaton
+    search.  regmust() code and heuristics [see dfa.c] courtesy
+    Arthur David Olson.
+    */
+  if (line_numbers == 0 && nonmatching_lines == 0)
+    {
+      if (reg.mustn == 0 || reg.mustn == MUST_MAX ||
+  	  index(reg.must, '\0') != reg.must + reg.mustn)
+	bmgexec = 0;
+      else
+	{
+	  reg.must[reg.mustn] = '\0';
+	  if (getenv("MUSTDEBUG") != NULL)
+	    (void) printf("must have: \"%s\"\n", reg.must);
+	  bmg_setup(reg.must, ignore_case);
+	  bmgexec = 1;
+	}
+    }
+  
+  if (argc - optind < 2)
+    no_filenames = 1;
+
+  initialize_buffer();
+
+  if (argc > optind)
+    while (optind < argc)
+      {
+	bufprev = eof = 0;
+	filename = argv[optind++];
+	fd = open(filename, 0, 0);
+	if (fd < 0)
+	  {
+	    fprintf(stderr, "%s: %s: %s\n", prog, filename,
+		    sys_errlist[errno]);
+	    error = 1;
+	    continue;
+	  }
+	if (line_count = grep())
+	  matches_found = 1;
+	close(fd);
+	if (count_lines)
+	  if (!no_filenames)
+	    printf("%s:%d\n", filename, line_count);
+	  else
+	    printf("%d\n", line_count);
+	else if (list_files && line_count)
+	  printf("%s\n", filename);
+      }
+  else
+    {
+      if (line_count = grep())
+	matches_found = 1;
+      if (count_lines)
+	printf("%d\n", line_count);
+      else if (list_files && line_count)
+	printf("<stdin>\n");
+    }
+
+  if (error)
+    exit(ERROR);
+  if (matches_found)
+    exit(MATCHES_FOUND);
+  exit(NO_MATCHES_FOUND);
+  return NO_MATCHES_FOUND;
+}
+
+/* Needed by the regexp routines.  This could be fancier, especially when
+   dealing with parallel regexps in files. */
+void
+regerror(s)
+     const char *s;
+{
+  fprintf(stderr, "%s: %s\n", prog, s);
+  exit(ERROR);
+}
+
+/*
+   bmg_setup() and bmg_search() adapted from:
+     Boyer/Moore/Gosper-assisted 'egrep' search, with delta0 table as in
+     original paper (CACM, October, 1977).  No delta1 or delta2.  According to
+     experiment (Horspool, Soft. Prac. Exp., 1982), delta2 is of minimal
+     practical value.  However, to improve for worst case input, integrating
+     the improved Galil strategies (Apostolico/Giancarlo, Siam. J. Comput.,
+     February 1986) deserves consideration.
+
+     James A. Woods				Copyleft (C) 1986, 1988
+     NASA Ames Research Center
+*/
+
+char *
+execute(r, begin, end, newline, count, try_backref)
+  struct regexp *r;
+  char *begin;
+  char *end;
+  int newline;
+  int *count;
+  int *try_backref;
+{
+  register char *p, *s;
+  char *match;
+  char *start = begin;
+  char save;			/* regexecute() sentinel */
+  int len;
+  char *bmg_search();
+
+  if (!bmgexec)			/* full automaton search */
+    return(regexecute(r, begin, end, newline, count, try_backref));
+  else
+    {
+      len = end - begin; 
+      while ((match = bmg_search((unsigned char *) start, len)) != NULL)
+	{
+	  p = match;		/* narrow search range to submatch line */
+	  while (p > begin && *p != '\n')
+	    p--;
+	  s = match;
+	  while (s < end && *s != '\n')
+	    s++;
+	  s++;
+
+	  save = *s;
+	  *s = '\0';
+	  match = regexecute(r, p, s, newline, count, try_backref);
+	  *s = save;
+
+	  if (match != NULL)
+	    return((char *) match);
+	  else
+	    {
+	      start = s;
+	      len = end - start;
+	    }
+	}
+      return(NULL);
+    }
+}
+
+int		delta0[256];
+unsigned char   cmap[256];		/* (un)folded characters */
+unsigned char	pattern[5000];
+int		patlen;
+
+char *
+bmg_search(buffer, buflen)
+  unsigned char *buffer;
+  int buflen;
+{
+  register unsigned char *k, *strend, *s, *buflim;
+  register int t;
+  int j;
+
+  if (patlen > buflen)
+    return NULL;
+
+  buflim = buffer + buflen;
+  if (buflen > patlen * 4)
+    strend = buflim - patlen * 4;
+  else
+    strend = buffer;
+
+  s = buffer;
+  k = buffer + patlen - 1;
+
+  for (;;)
+    {
+      /* The dreaded inner loop, revisited. */
+      while (k < strend && (t = delta0[*k]))
+	{
+	  k += t;
+	  k += delta0[*k];
+	  k += delta0[*k];
+	}
+      while (k < buflim && delta0[*k])
+	++k;
+      if (k == buflim)
+	break;
+    
+      j = patlen - 1;
+      s = k;
+      while (--j >= 0 && cmap[*--s] == pattern[j])
+	;
+      /* 
+	delta-less shortcut for literati, but 
+	short shrift for genetic engineers.
+      */
+      if (j >= 0)
+	k++;
+      else 		/* submatch */
+	return ((char *)k);
+    }
+  return(NULL);
+}
+
+int
+bmg_setup(pat, folded)			/* compute "boyer-moore" delta table */
+  char *pat;
+  int folded;
+{					/* ... HAKMEM lives ... */
+  int j;
+
+  patlen = strlen(pat);
+
+  if (folded) 				/* fold case while saving pattern */
+    for (j = 0; j < patlen; j++) 
+      pattern[j] = (isupper((int) pat[j]) ?
+	(char) tolower((int) pat[j]) : pat[j]);
+  else
+      bcopy(pat, pattern, patlen);
+
+  for (j = 0; j < 256; j++)
+    {
+      delta0[j] = patlen;
+      cmap[j] = (char) j;		/* could be done at compile time */
+    }
+  for (j = 0; j < patlen - 1; j++)
+    delta0[pattern[j]] = patlen - j - 1;
+  delta0[pattern[patlen - 1]] = 0;
+
+  if (folded)
+    {
+      for (j = 0; j < patlen - 1; j++)
+	if (islower((int) pattern[j]))
+	  delta0[toupper((int) pattern[j])] = patlen - j - 1;
+	if (islower((int) pattern[patlen - 1]))
+	  delta0[toupper((int) pattern[patlen - 1])] = 0;
+      for (j = 'A'; j <= 'Z'; j++)
+	cmap[j] = (char) tolower((int) j);
+    }
+}
diff --git a/gnu/usr.bin/grep/regex.c b/gnu/usr.bin/grep/regex.c
new file mode 100644
index 000000000000..d69e78438cad
--- /dev/null
+++ b/gnu/usr.bin/grep/regex.c
@@ -0,0 +1,1768 @@
+/* Extended regular expression matching and search library.
+   Copyright (C) 1985, 1989 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+
+   In other words, you are welcome to use, share and improve this program.
+   You are forbidden to forbid anyone else to use, share and improve
+   what you give them.   Help stamp out software-hoarding!  */
+
+
+/* To test, compile with -Dtest.
+ This Dtestable feature turns this into a self-contained program
+ which reads a pattern, describes how it compiles,
+ then reads a string and searches for it.  */
+
+/* AIX requires this to be the first thing in the file. */
+#ifdef __GNUC__
+#undef alloca
+#define alloca __builtin_alloca
+#else /* not __GNUC__ */
+#if defined(sparc) && !defined(USG) && !defined(SVR4) && !defined(__svr4__)
+#include <alloca.h>
+#else
+#ifdef _AIX
+ #pragma alloca
+#else
+char *alloca ();
+#endif
+#endif /* sparc */
+#endif /* not __GNUC__ */
+
+
+#ifdef emacs
+
+/* The `emacs' switch turns on certain special matching commands
+ that make sense only in emacs. */
+
+#include "config.h"
+#include "lisp.h"
+#include "buffer.h"
+#include "syntax.h"
+
+#else  /* not emacs */
+
+#if defined(USG) || defined(STDC_HEADERS)
+#include <string.h>
+#ifndef bcopy
+#define bcopy(s,d,n)	memcpy((d),(s),(n))
+#endif
+#ifndef bcmp
+#define bcmp(s1,s2,n)	memcmp((s1),(s2),(n))
+#endif
+#ifndef bzero
+#define bzero(s,n)	memset((s),0,(n))
+#endif
+#endif
+
+#ifdef STDC_HEADERS
+#include <stdlib.h>
+#else
+char *malloc ();
+#endif
+
+/*
+ * Define the syntax stuff, so we can do the \<...\> things.
+ */
+
+#ifndef Sword /* must be non-zero in some of the tests below... */
+#define Sword 1
+#endif
+
+#define SYNTAX(c) re_syntax_table[c]
+
+#ifdef SYNTAX_TABLE
+
+char *re_syntax_table;
+
+#else
+
+static char re_syntax_table[256];
+
+static void
+init_syntax_once ()
+{
+   register int c;
+   static int done = 0;
+
+   if (done)
+     return;
+
+   bzero (re_syntax_table, sizeof re_syntax_table);
+
+   for (c = 'a'; c <= 'z'; c++)
+     re_syntax_table[c] = Sword;
+
+   for (c = 'A'; c <= 'Z'; c++)
+     re_syntax_table[c] = Sword;
+
+   for (c = '0'; c <= '9'; c++)
+     re_syntax_table[c] = Sword;
+
+   done = 1;
+}
+
+#endif /* SYNTAX_TABLE */
+#endif /* not emacs */
+
+#include "regex.h"
+
+/* Number of failure points to allocate space for initially,
+ when matching.  If this number is exceeded, more space is allocated,
+ so it is not a hard limit.  */
+
+#ifndef NFAILURES
+#define NFAILURES 80
+#endif /* NFAILURES */
+
+/* width of a byte in bits */
+
+#define BYTEWIDTH 8
+
+#ifndef SIGN_EXTEND_CHAR
+#ifdef __CHAR_UNSIGNED__
+#define SIGN_EXTEND_CHAR(c) ((c)>(char)127?(c)-256:(c))
+#else
+#define SIGN_EXTEND_CHAR(x) (x)
+#endif
+#endif
+
+static int obscure_syntax = 0;
+
+/* Specify the precise syntax of regexp for compilation.
+   This provides for compatibility for various utilities
+   which historically have different, incompatible syntaxes.
+
+   The argument SYNTAX is a bit-mask containing the two bits
+   RE_NO_BK_PARENS and RE_NO_BK_VBAR.  */
+
+int
+re_set_syntax (syntax)
+     int syntax;
+{
+  int ret;
+
+  ret = obscure_syntax;
+  obscure_syntax = syntax;
+  return ret;
+}
+
+/* re_compile_pattern takes a regular-expression string
+   and converts it into a buffer full of byte commands for matching.
+
+  PATTERN   is the address of the pattern string
+  SIZE      is the length of it.
+  BUFP	    is a  struct re_pattern_buffer *  which points to the info
+	    on where to store the byte commands.
+	    This structure contains a  char *  which points to the
+	    actual space, which should have been obtained with malloc.
+	    re_compile_pattern may use  realloc  to grow the buffer space.
+
+  The number of bytes of commands can be found out by looking in
+  the  struct re_pattern_buffer  that bufp pointed to,
+  after re_compile_pattern returns.
+*/
+
+#define PATPUSH(ch) (*b++ = (char) (ch))
+
+#define PATFETCH(c) \
+ {if (p == pend) goto end_of_pattern; \
+  c = * (unsigned char *) p++; \
+  if (translate) c = translate[c]; }
+
+#define PATFETCH_RAW(c) \
+ {if (p == pend) goto end_of_pattern; \
+  c = * (unsigned char *) p++; }
+
+#define PATUNFETCH p--
+
+#define EXTEND_BUFFER \
+  { char *old_buffer = bufp->buffer; \
+    if (bufp->allocated == (1<<16)) goto too_big; \
+    bufp->allocated *= 2; \
+    if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \
+    if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
+      goto memory_exhausted; \
+    c = bufp->buffer - old_buffer; \
+    b += c; \
+    if (fixup_jump) \
+      fixup_jump += c; \
+    if (laststart) \
+      laststart += c; \
+    begalt += c; \
+    if (pending_exact) \
+      pending_exact += c; \
+  }
+
+static void store_jump (), insert_jump ();
+
+char *
+re_compile_pattern (pattern, size, bufp)
+     char *pattern;
+     int size;
+     struct re_pattern_buffer *bufp;
+{
+  register char *b = bufp->buffer;
+  register char *p = pattern;
+  char *pend = pattern + size;
+  register unsigned c, c1;
+  char *p1;
+  unsigned char *translate = (unsigned char *) bufp->translate;
+
+  /* address of the count-byte of the most recently inserted "exactn" command.
+    This makes it possible to tell whether a new exact-match character
+    can be added to that command or requires a new "exactn" command. */
+     
+  char *pending_exact = 0;
+
+  /* address of the place where a forward-jump should go
+    to the end of the containing expression.
+    Each alternative of an "or", except the last, ends with a forward-jump
+    of this sort. */
+
+  char *fixup_jump = 0;
+
+  /* address of start of the most recently finished expression.
+    This tells postfix * where to find the start of its operand. */
+
+  char *laststart = 0;
+
+  /* In processing a repeat, 1 means zero matches is allowed */
+
+  char zero_times_ok;
+
+  /* In processing a repeat, 1 means many matches is allowed */
+
+  char many_times_ok;
+
+  /* address of beginning of regexp, or inside of last \( */
+
+  char *begalt = b;
+
+  /* Stack of information saved by \( and restored by \).
+     Four stack elements are pushed by each \(:
+       First, the value of b.
+       Second, the value of fixup_jump.
+       Third, the value of regnum.
+       Fourth, the value of begalt.  */
+
+  int stackb[40];
+  int *stackp = stackb;
+  int *stacke = stackb + 40;
+  int *stackt;
+
+  /* Counts \('s as they are encountered.  Remembered for the matching \),
+     where it becomes the "register number" to put in the stop_memory command */
+
+  int regnum = 1;
+
+  bufp->fastmap_accurate = 0;
+
+#ifndef emacs
+#ifndef SYNTAX_TABLE
+  /*
+   * Initialize the syntax table.
+   */
+   init_syntax_once();
+#endif
+#endif
+
+  if (bufp->allocated == 0)
+    {
+      bufp->allocated = 28;
+      if (bufp->buffer)
+	/* EXTEND_BUFFER loses when bufp->allocated is 0 */
+	bufp->buffer = (char *) realloc (bufp->buffer, 28);
+      else
+	/* Caller did not allocate a buffer.  Do it for him */
+	bufp->buffer = (char *) malloc (28);
+      if (!bufp->buffer) goto memory_exhausted;
+      begalt = b = bufp->buffer;
+    }
+
+  while (p != pend)
+    {
+      if (b - bufp->buffer > bufp->allocated - 10)
+	/* Note that EXTEND_BUFFER clobbers c */
+	EXTEND_BUFFER;
+
+      PATFETCH (c);
+
+      switch (c)
+	{
+	case '$':
+	  if (obscure_syntax & RE_TIGHT_VBAR)
+	    {
+	      if (! (obscure_syntax & RE_CONTEXT_INDEP_OPS) && p != pend)
+		goto normal_char;
+	      /* Make operand of last vbar end before this `$'.  */
+	      if (fixup_jump)
+		store_jump (fixup_jump, jump, b);
+	      fixup_jump = 0;
+	      PATPUSH (endline);
+	      break;
+	    }
+
+	  /* $ means succeed if at end of line, but only in special contexts.
+	    If randomly in the middle of a pattern, it is a normal character. */
+	  if (p == pend || *p == '\n'
+	      || (obscure_syntax & RE_CONTEXT_INDEP_OPS)
+	      || (obscure_syntax & RE_NO_BK_PARENS
+		  ? *p == ')'
+		  : *p == '\\' && p[1] == ')')
+	      || (obscure_syntax & RE_NO_BK_VBAR
+		  ? *p == '|'
+		  : *p == '\\' && p[1] == '|'))
+	    {
+	      PATPUSH (endline);
+	      break;
+	    }
+	  goto normal_char;
+
+	case '^':
+	  /* ^ means succeed if at beg of line, but only if no preceding pattern. */
+
+	  if (laststart && p[-2] != '\n'
+	      && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
+	    goto normal_char;
+	  if (obscure_syntax & RE_TIGHT_VBAR)
+	    {
+	      if (p != pattern + 1
+		  && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
+		goto normal_char;
+	      PATPUSH (begline);
+	      begalt = b;
+	    }
+	  else
+	    PATPUSH (begline);
+	  break;
+
+	case '+':
+	case '?':
+	  if (obscure_syntax & RE_BK_PLUS_QM)
+	    goto normal_char;
+	handle_plus:
+	case '*':
+	  /* If there is no previous pattern, char not special. */
+	  if (!laststart && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
+	    goto normal_char;
+	  /* If there is a sequence of repetition chars,
+	     collapse it down to equivalent to just one.  */
+	  zero_times_ok = 0;
+	  many_times_ok = 0;
+	  while (1)
+	    {
+	      zero_times_ok |= c != '+';
+	      many_times_ok |= c != '?';
+	      if (p == pend)
+		break;
+	      PATFETCH (c);
+	      if (c == '*')
+		;
+	      else if (!(obscure_syntax & RE_BK_PLUS_QM)
+		       && (c == '+' || c == '?'))
+		;
+	      else if ((obscure_syntax & RE_BK_PLUS_QM)
+		       && c == '\\')
+		{
+		  int c1;
+		  PATFETCH (c1);
+		  if (!(c1 == '+' || c1 == '?'))
+		    {
+		      PATUNFETCH;
+		      PATUNFETCH;
+		      break;
+		    }
+		  c = c1;
+		}
+	      else
+		{
+		  PATUNFETCH;
+		  break;
+		}
+	    }
+
+	  /* Star, etc. applied to an empty pattern is equivalent
+	     to an empty pattern.  */
+	  if (!laststart)
+	    break;
+
+	  /* Now we know whether 0 matches is allowed,
+	     and whether 2 or more matches is allowed.  */
+	  if (many_times_ok)
+	    {
+	      /* If more than one repetition is allowed,
+		 put in a backward jump at the end.  */
+	      store_jump (b, maybe_finalize_jump, laststart - 3);
+	      b += 3;
+	    }
+	  insert_jump (on_failure_jump, laststart, b + 3, b);
+	  pending_exact = 0;
+	  b += 3;
+	  if (!zero_times_ok)
+	    {
+	      /* At least one repetition required: insert before the loop
+		 a skip over the initial on-failure-jump instruction */
+	      insert_jump (dummy_failure_jump, laststart, laststart + 6, b);
+	      b += 3;
+	    }
+	  break;
+
+	case '.':
+	  laststart = b;
+	  PATPUSH (anychar);
+	  break;
+
+	case '[':
+	  while (b - bufp->buffer
+		 > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH)
+	    /* Note that EXTEND_BUFFER clobbers c */
+	    EXTEND_BUFFER;
+
+	  laststart = b;
+	  if (*p == '^')
+	    PATPUSH (charset_not), p++;
+	  else
+	    PATPUSH (charset);
+	  p1 = p;
+
+	  PATPUSH ((1 << BYTEWIDTH) / BYTEWIDTH);
+	  /* Clear the whole map */
+	  bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH);
+	  /* Read in characters and ranges, setting map bits */
+	  while (1)
+	    {
+	      PATFETCH (c);
+	      if (c == ']' && p != p1 + 1) break;
+	      if (*p == '-' && p[1] != ']')
+		{
+		  PATFETCH (c1);
+		  PATFETCH (c1);
+		  while (c <= c1)
+		    b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH), c++;
+		}
+	      else
+		{
+		  b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH);
+		}
+	    }
+	  /* Discard any bitmap bytes that are all 0 at the end of the map.
+	     Decrement the map-length byte too. */
+	  while ((int) b[-1] > 0 && b[b[-1] - 1] == 0)
+	    b[-1]--;
+	  b += b[-1];
+	  break;
+
+	case '(':
+	  if (! (obscure_syntax & RE_NO_BK_PARENS))
+	    goto normal_char;
+	  else
+	    goto handle_open;
+
+	case ')':
+	  if (! (obscure_syntax & RE_NO_BK_PARENS))
+	    goto normal_char;
+	  else
+	    goto handle_close;
+
+	case '\n':
+	  if (! (obscure_syntax & RE_NEWLINE_OR))
+	    goto normal_char;
+	  else
+	    goto handle_bar;
+
+	case '|':
+	  if (! (obscure_syntax & RE_NO_BK_VBAR))
+	    goto normal_char;
+	  else
+	    goto handle_bar;
+
+        case '\\':
+	  if (p == pend) goto invalid_pattern;
+	  PATFETCH_RAW (c);
+	  switch (c)
+	    {
+	    case '(':
+	      if (obscure_syntax & RE_NO_BK_PARENS)
+		goto normal_backsl;
+	    handle_open:
+	      if (stackp == stacke) goto nesting_too_deep;
+	      if (regnum < RE_NREGS)
+	        {
+		  PATPUSH (start_memory);
+		  PATPUSH (regnum);
+	        }
+	      *stackp++ = b - bufp->buffer;
+	      *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0;
+	      *stackp++ = regnum++;
+	      *stackp++ = begalt - bufp->buffer;
+	      fixup_jump = 0;
+	      laststart = 0;
+	      begalt = b;
+	      break;
+
+	    case ')':
+	      if (obscure_syntax & RE_NO_BK_PARENS)
+		goto normal_backsl;
+	    handle_close:
+	      if (stackp == stackb) goto unmatched_close;
+	      begalt = *--stackp + bufp->buffer;
+	      if (fixup_jump)
+		store_jump (fixup_jump, jump, b);
+	      if (stackp[-1] < RE_NREGS)
+		{
+		  PATPUSH (stop_memory);
+		  PATPUSH (stackp[-1]);
+		}
+	      stackp -= 2;
+	      fixup_jump = 0;
+	      if (*stackp)
+		fixup_jump = *stackp + bufp->buffer - 1;
+	      laststart = *--stackp + bufp->buffer;
+	      break;
+
+	    case '|':
+	      if (obscure_syntax & RE_NO_BK_VBAR)
+		goto normal_backsl;
+	    handle_bar:
+	      insert_jump (on_failure_jump, begalt, b + 6, b);
+	      pending_exact = 0;
+	      b += 3;
+	      if (fixup_jump)
+		store_jump (fixup_jump, jump, b);
+	      fixup_jump = b;
+	      b += 3;
+	      laststart = 0;
+	      begalt = b;
+	      break;
+
+#ifdef emacs
+	    case '=':
+	      PATPUSH (at_dot);
+	      break;
+
+	    case 's':	
+	      laststart = b;
+	      PATPUSH (syntaxspec);
+	      PATFETCH (c);
+	      PATPUSH (syntax_spec_code[c]);
+	      break;
+
+	    case 'S':
+	      laststart = b;
+	      PATPUSH (notsyntaxspec);
+	      PATFETCH (c);
+	      PATPUSH (syntax_spec_code[c]);
+	      break;
+#endif /* emacs */
+
+	    case 'w':
+	      laststart = b;
+	      PATPUSH (wordchar);
+	      break;
+
+	    case 'W':
+	      laststart = b;
+	      PATPUSH (notwordchar);
+	      break;
+
+	    case '<':
+	      PATPUSH (wordbeg);
+	      break;
+
+	    case '>':
+	      PATPUSH (wordend);
+	      break;
+
+	    case 'b':
+	      PATPUSH (wordbound);
+	      break;
+
+	    case 'B':
+	      PATPUSH (notwordbound);
+	      break;
+
+	    case '`':
+	      PATPUSH (begbuf);
+	      break;
+
+	    case '\'':
+	      PATPUSH (endbuf);
+	      break;
+
+	    case '1':
+	    case '2':
+	    case '3':
+	    case '4':
+	    case '5':
+	    case '6':
+	    case '7':
+	    case '8':
+	    case '9':
+	      c1 = c - '0';
+	      if (c1 >= regnum)
+		goto normal_char;
+	      for (stackt = stackp - 2;  stackt > stackb;  stackt -= 4)
+ 		if (*stackt == c1)
+		  goto normal_char;
+	      laststart = b;
+	      PATPUSH (duplicate);
+	      PATPUSH (c1);
+	      break;
+
+	    case '+':
+	    case '?':
+	      if (obscure_syntax & RE_BK_PLUS_QM)
+		goto handle_plus;
+
+	    default:
+	    normal_backsl:
+	      /* You might think it would be useful for \ to mean
+		 not to translate; but if we don't translate it
+		 it will never match anything.  */
+	      if (translate) c = translate[c];
+	      goto normal_char;
+	    }
+	  break;
+
+	default:
+	normal_char:
+	  if (!pending_exact || pending_exact + *pending_exact + 1 != b
+	      || *pending_exact == 0177 || *p == '*' || *p == '^'
+	      || ((obscure_syntax & RE_BK_PLUS_QM)
+		  ? *p == '\\' && (p[1] == '+' || p[1] == '?')
+		  : (*p == '+' || *p == '?')))
+	    {
+	      laststart = b;
+	      PATPUSH (exactn);
+	      pending_exact = b;
+	      PATPUSH (0);
+	    }
+	  PATPUSH (c);
+	  (*pending_exact)++;
+	}
+    }
+
+  if (fixup_jump)
+    store_jump (fixup_jump, jump, b);
+
+  if (stackp != stackb) goto unmatched_open;
+
+  bufp->used = b - bufp->buffer;
+  return 0;
+
+ invalid_pattern:
+  return "Invalid regular expression";
+
+ unmatched_open:
+  return "Unmatched \\(";
+
+ unmatched_close:
+  return "Unmatched \\)";
+
+ end_of_pattern:
+  return "Premature end of regular expression";
+
+ nesting_too_deep:
+  return "Nesting too deep";
+
+ too_big:
+  return "Regular expression too big";
+
+ memory_exhausted:
+  return "Memory exhausted";
+}
+
+/* Store where `from' points a jump operation to jump to where `to' points.
+  `opcode' is the opcode to store. */
+
+static void
+store_jump (from, opcode, to)
+     char *from, *to;
+     char opcode;
+{
+  from[0] = opcode;
+  from[1] = (to - (from + 3)) & 0377;
+  from[2] = (to - (from + 3)) >> 8;
+}
+
+/* Open up space at char FROM, and insert there a jump to TO.
+   CURRENT_END gives te end of the storage no in use,
+   so we know how much data to copy up.
+   OP is the opcode of the jump to insert.
+
+   If you call this function, you must zero out pending_exact.  */
+
+static void
+insert_jump (op, from, to, current_end)
+     char op;
+     char *from, *to, *current_end;
+{
+  register char *pto = current_end + 3;
+  register char *pfrom = current_end;
+  while (pfrom != from)
+    *--pto = *--pfrom;
+  store_jump (from, op, to);
+}
+
+/* Given a pattern, compute a fastmap from it.
+ The fastmap records which of the (1 << BYTEWIDTH) possible characters
+ can start a string that matches the pattern.
+ This fastmap is used by re_search to skip quickly over totally implausible text.
+
+ The caller must supply the address of a (1 << BYTEWIDTH)-byte data area
+ as bufp->fastmap.
+ The other components of bufp describe the pattern to be used.  */
+
+void
+re_compile_fastmap (bufp)
+     struct re_pattern_buffer *bufp;
+{
+  unsigned char *pattern = (unsigned char *) bufp->buffer;
+  int size = bufp->used;
+  register char *fastmap = bufp->fastmap;
+  register unsigned char *p = pattern;
+  register unsigned char *pend = pattern + size;
+  register int j, k;
+  unsigned char *translate = (unsigned char *) bufp->translate;
+
+  unsigned char *stackb[NFAILURES];
+  unsigned char **stackp = stackb;
+
+  bzero (fastmap, (1 << BYTEWIDTH));
+  bufp->fastmap_accurate = 1;
+  bufp->can_be_null = 0;
+      
+  while (p)
+    {
+      if (p == pend)
+	{
+	  bufp->can_be_null = 1;
+	  break;
+	}
+#ifdef SWITCH_ENUM_BUG
+      switch ((int) ((enum regexpcode) *p++))
+#else
+      switch ((enum regexpcode) *p++)
+#endif
+	{
+	case exactn:
+	  if (translate)
+	    fastmap[translate[p[1]]] = 1;
+	  else
+	    fastmap[p[1]] = 1;
+	  break;
+
+        case begline:
+        case before_dot:
+	case at_dot:
+	case after_dot:
+	case begbuf:
+	case endbuf:
+	case wordbound:
+	case notwordbound:
+	case wordbeg:
+	case wordend:
+	  continue;
+
+	case endline:
+	  if (translate)
+	    fastmap[translate['\n']] = 1;
+	  else
+	    fastmap['\n'] = 1;
+	  if (bufp->can_be_null != 1)
+	    bufp->can_be_null = 2;
+	  break;
+
+	case finalize_jump:
+	case maybe_finalize_jump:
+	case jump:
+	case dummy_failure_jump:
+	  bufp->can_be_null = 1;
+	  j = *p++ & 0377;
+	  j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
+	  p += j + 1;		/* The 1 compensates for missing ++ above */
+	  if (j > 0)
+	    continue;
+	  /* Jump backward reached implies we just went through
+	     the body of a loop and matched nothing.
+	     Opcode jumped to should be an on_failure_jump.
+	     Just treat it like an ordinary jump.
+	     For a * loop, it has pushed its failure point already;
+	     if so, discard that as redundant.  */
+	  if ((enum regexpcode) *p != on_failure_jump)
+	    continue;
+	  p++;
+	  j = *p++ & 0377;
+	  j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
+	  p += j + 1;		/* The 1 compensates for missing ++ above */
+	  if (stackp != stackb && *stackp == p)
+	    stackp--;
+	  continue;
+	  
+	case on_failure_jump:
+	  j = *p++ & 0377;
+	  j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
+	  p++;
+	  *++stackp = p + j;
+	  continue;
+
+	case start_memory:
+	case stop_memory:
+	  p++;
+	  continue;
+
+	case duplicate:
+	  bufp->can_be_null = 1;
+	  fastmap['\n'] = 1;
+	case anychar:
+	  for (j = 0; j < (1 << BYTEWIDTH); j++)
+	    if (j != '\n')
+	      fastmap[j] = 1;
+	  if (bufp->can_be_null)
+	    return;
+	  /* Don't return; check the alternative paths
+	     so we can set can_be_null if appropriate.  */
+	  break;
+
+	case wordchar:
+	  for (j = 0; j < (1 << BYTEWIDTH); j++)
+	    if (SYNTAX (j) == Sword)
+	      fastmap[j] = 1;
+	  break;
+
+	case notwordchar:
+	  for (j = 0; j < (1 << BYTEWIDTH); j++)
+	    if (SYNTAX (j) != Sword)
+	      fastmap[j] = 1;
+	  break;
+
+#ifdef emacs
+	case syntaxspec:
+	  k = *p++;
+	  for (j = 0; j < (1 << BYTEWIDTH); j++)
+	    if (SYNTAX (j) == (enum syntaxcode) k)
+	      fastmap[j] = 1;
+	  break;
+
+	case notsyntaxspec:
+	  k = *p++;
+	  for (j = 0; j < (1 << BYTEWIDTH); j++)
+	    if (SYNTAX (j) != (enum syntaxcode) k)
+	      fastmap[j] = 1;
+	  break;
+#endif /* emacs */
+
+	case charset:
+	  for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
+	    if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
+	      {
+		if (translate)
+		  fastmap[translate[j]] = 1;
+		else
+		  fastmap[j] = 1;
+	      }
+	  break;
+
+	case charset_not:
+	  /* Chars beyond end of map must be allowed */
+	  for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
+	    if (translate)
+	      fastmap[translate[j]] = 1;
+	    else
+	      fastmap[j] = 1;
+
+	  for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
+	    if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
+	      {
+		if (translate)
+		  fastmap[translate[j]] = 1;
+		else
+		  fastmap[j] = 1;
+	      }
+	  break;
+
+	default:
+	  break;
+	}
+
+      /* Get here means we have successfully found the possible starting characters
+	 of one path of the pattern.  We need not follow this path any farther.
+	 Instead, look at the next alternative remembered in the stack. */
+      if (stackp != stackb)
+	p = *stackp--;
+      else
+	break;
+    }
+}
+
+/* Like re_search_2, below, but only one string is specified. */
+
+int
+re_search (pbufp, string, size, startpos, range, regs)
+     struct re_pattern_buffer *pbufp;
+     char *string;
+     int size, startpos, range;
+     struct re_registers *regs;
+{
+  return re_search_2 (pbufp, 0, 0, string, size, startpos, range, regs, size);
+}
+
+/* Like re_match_2 but tries first a match starting at index STARTPOS,
+   then at STARTPOS + 1, and so on.
+   RANGE is the number of places to try before giving up.
+   If RANGE is negative, the starting positions tried are
+    STARTPOS, STARTPOS - 1, etc.
+   It is up to the caller to make sure that range is not so large
+   as to take the starting position outside of the input strings.
+
+The value returned is the position at which the match was found,
+ or -1 if no match was found,
+ or -2 if error (such as failure stack overflow).  */
+
+int
+re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop)
+     struct re_pattern_buffer *pbufp;
+     char *string1, *string2;
+     int size1, size2;
+     int startpos;
+     register int range;
+     struct re_registers *regs;
+     int mstop;
+{
+  register char *fastmap = pbufp->fastmap;
+  register unsigned char *translate = (unsigned char *) pbufp->translate;
+  int total = size1 + size2;
+  int val;
+
+  /* Update the fastmap now if not correct already */
+  if (fastmap && !pbufp->fastmap_accurate)
+    re_compile_fastmap (pbufp);
+  
+  /* Don't waste time in a long search for a pattern
+     that says it is anchored.  */
+  if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
+      && range > 0)
+    {
+      if (startpos > 0)
+	return -1;
+      else
+	range = 1;
+    }
+
+  while (1)
+    {
+      /* If a fastmap is supplied, skip quickly over characters
+	 that cannot possibly be the start of a match.
+	 Note, however, that if the pattern can possibly match
+	 the null string, we must test it at each starting point
+	 so that we take the first null string we get.  */
+
+      if (fastmap && startpos < total && pbufp->can_be_null != 1)
+	{
+	  if (range > 0)
+	    {
+	      register int lim = 0;
+	      register unsigned char *p;
+	      int irange = range;
+	      if (startpos < size1 && startpos + range >= size1)
+		lim = range - (size1 - startpos);
+
+	      p = ((unsigned char *)
+		   &(startpos >= size1 ? string2 - size1 : string1)[startpos]);
+
+	      if (translate)
+		{
+		  while (range > lim && !fastmap[translate[*p++]])
+		    range--;
+		}
+	      else
+		{
+		  while (range > lim && !fastmap[*p++])
+		    range--;
+		}
+	      startpos += irange - range;
+	    }
+	  else
+	    {
+	      register unsigned char c;
+	      if (startpos >= size1)
+		c = string2[startpos - size1];
+	      else
+		c = string1[startpos];
+	      c &= 0xff;
+	      if (translate ? !fastmap[translate[c]] : !fastmap[c])
+		goto advance;
+	    }
+	}
+
+      if (range >= 0 && startpos == total
+	  && fastmap && pbufp->can_be_null == 0)
+	return -1;
+
+      val = re_match_2 (pbufp, string1, size1, string2, size2, startpos, regs, mstop);
+      if (0 <= val)
+	{
+	  if (val == -2)
+	    return -2;
+	  return startpos;
+	}
+
+#ifdef C_ALLOCA
+      alloca (0);
+#endif /* C_ALLOCA */
+
+    advance:
+      if (!range) break;
+      if (range > 0) range--, startpos++; else range++, startpos--;
+    }
+  return -1;
+}
+
+#ifndef emacs   /* emacs never uses this */
+int
+re_match (pbufp, string, size, pos, regs)
+     struct re_pattern_buffer *pbufp;
+     char *string;
+     int size, pos;
+     struct re_registers *regs;
+{
+  return re_match_2 (pbufp, 0, 0, string, size, pos, regs, size);
+}
+#endif /* emacs */
+
+/* Maximum size of failure stack.  Beyond this, overflow is an error.  */
+
+int re_max_failures = 2000;
+
+static int bcmp_translate();
+/* Match the pattern described by PBUFP
+   against data which is the virtual concatenation of STRING1 and STRING2.
+   SIZE1 and SIZE2 are the sizes of the two data strings.
+   Start the match at position POS.
+   Do not consider matching past the position MSTOP.
+
+   If pbufp->fastmap is nonzero, then it had better be up to date.
+
+   The reason that the data to match are specified as two components
+   which are to be regarded as concatenated
+   is so this function can be used directly on the contents of an Emacs buffer.
+
+   -1 is returned if there is no match.  -2 is returned if there is
+   an error (such as match stack overflow).  Otherwise the value is the length
+   of the substring which was matched.  */
+
+int
+re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
+     struct re_pattern_buffer *pbufp;
+     unsigned char *string1, *string2;
+     int size1, size2;
+     int pos;
+     struct re_registers *regs;
+     int mstop;
+{
+  register unsigned char *p = (unsigned char *) pbufp->buffer;
+  register unsigned char *pend = p + pbufp->used;
+  /* End of first string */
+  unsigned char *end1;
+  /* End of second string */
+  unsigned char *end2;
+  /* Pointer just past last char to consider matching */
+  unsigned char *end_match_1, *end_match_2;
+  register unsigned char *d, *dend;
+  register int mcnt;
+  unsigned char *translate = (unsigned char *) pbufp->translate;
+
+ /* Failure point stack.  Each place that can handle a failure further down the line
+    pushes a failure point on this stack.  It consists of two char *'s.
+    The first one pushed is where to resume scanning the pattern;
+    the second pushed is where to resume scanning the strings.
+    If the latter is zero, the failure point is a "dummy".
+    If a failure happens and the innermost failure point is dormant,
+    it discards that failure point and tries the next one. */
+
+  unsigned char *initial_stack[2 * NFAILURES];
+  unsigned char **stackb = initial_stack;
+  unsigned char **stackp = stackb, **stacke = &stackb[2 * NFAILURES];
+
+  /* Information on the "contents" of registers.
+     These are pointers into the input strings; they record
+     just what was matched (on this attempt) by some part of the pattern.
+     The start_memory command stores the start of a register's contents
+     and the stop_memory command stores the end.
+
+     At that point, regstart[regnum] points to the first character in the register,
+     regend[regnum] points to the first character beyond the end of the register,
+     regstart_seg1[regnum] is true iff regstart[regnum] points into string1,
+     and regend_seg1[regnum] is true iff regend[regnum] points into string1.  */
+
+  unsigned char *regstart[RE_NREGS];
+  unsigned char *regend[RE_NREGS];
+  unsigned char regstart_seg1[RE_NREGS], regend_seg1[RE_NREGS];
+
+  /* Set up pointers to ends of strings.
+     Don't allow the second string to be empty unless both are empty.  */
+  if (!size2)
+    {
+      string2 = string1;
+      size2 = size1;
+      string1 = 0;
+      size1 = 0;
+    }
+  end1 = string1 + size1;
+  end2 = string2 + size2;
+
+  /* Compute where to stop matching, within the two strings */
+  if (mstop <= size1)
+    {
+      end_match_1 = string1 + mstop;
+      end_match_2 = string2;
+    }
+  else
+    {
+      end_match_1 = end1;
+      end_match_2 = string2 + mstop - size1;
+    }
+
+  /* Initialize \) text positions to -1
+     to mark ones that no \( or \) has been seen for.  */
+
+  for (mcnt = 0; mcnt < sizeof (regend) / sizeof (*regend); mcnt++)
+    regend[mcnt] = (unsigned char *) -1;
+
+  /* `p' scans through the pattern as `d' scans through the data.
+     `dend' is the end of the input string that `d' points within.
+     `d' is advanced into the following input string whenever necessary,
+     but this happens before fetching;
+     therefore, at the beginning of the loop,
+     `d' can be pointing at the end of a string,
+     but it cannot equal string2.  */
+
+  if (pos <= size1)
+    d = string1 + pos, dend = end_match_1;
+  else
+    d = string2 + pos - size1, dend = end_match_2;
+
+/* Write PREFETCH; just before fetching a character with *d.  */
+#define PREFETCH \
+ while (d == dend)						    \
+  { if (dend == end_match_2) goto fail;  /* end of string2 => failure */   \
+    d = string2;  /* end of string1 => advance to string2. */       \
+    dend = end_match_2; }
+
+  /* This loop loops over pattern commands.
+     It exits by returning from the function if match is complete,
+     or it drops through if match fails at this starting point in the input data. */
+
+  while (1)
+    {
+      if (p == pend)
+	/* End of pattern means we have succeeded! */
+	{
+	  /* If caller wants register contents data back, convert it to indices */
+	  if (regs)
+	    {
+ 	      regs->start[0] = pos;
+ 	      if (dend == end_match_1)
+ 		regs->end[0] = d - string1;
+ 	      else
+ 		regs->end[0] = d - string2 + size1;
+ 	      for (mcnt = 1; mcnt < RE_NREGS; mcnt++)
+		{
+		  if (regend[mcnt] == (unsigned char *) -1)
+		    {
+		      regs->start[mcnt] = -1;
+		      regs->end[mcnt] = -1;
+		      continue;
+		    }
+ 		  if (regstart_seg1[mcnt])
+		    regs->start[mcnt] = regstart[mcnt] - string1;
+		  else
+		    regs->start[mcnt] = regstart[mcnt] - string2 + size1;
+ 		  if (regend_seg1[mcnt])
+		    regs->end[mcnt] = regend[mcnt] - string1;
+		  else
+		    regs->end[mcnt] = regend[mcnt] - string2 + size1;
+		}
+	    }
+ 	  if (dend == end_match_1)
+	    return (d - string1 - pos);
+	  else
+	    return d - string2 + size1 - pos;
+	}
+
+      /* Otherwise match next pattern command */
+#ifdef SWITCH_ENUM_BUG
+      switch ((int) ((enum regexpcode) *p++))
+#else
+      switch ((enum regexpcode) *p++)
+#endif
+	{
+
+	/* \( is represented by a start_memory, \) by a stop_memory.
+	    Both of those commands contain a "register number" argument.
+	    The text matched within the \( and \) is recorded under that number.
+	    Then, \<digit> turns into a `duplicate' command which
+	    is followed by the numeric value of <digit> as the register number. */
+
+	case start_memory:
+	  regstart[*p] = d;
+ 	  regstart_seg1[*p++] = (dend == end_match_1);
+	  break;
+
+	case stop_memory:
+	  regend[*p] = d;
+ 	  regend_seg1[*p++] = (dend == end_match_1);
+	  break;
+
+	case duplicate:
+	  {
+	    int regno = *p++;   /* Get which register to match against */
+	    register unsigned char *d2, *dend2;
+
+	    d2 = regstart[regno];
+ 	    dend2 = ((regstart_seg1[regno] == regend_seg1[regno])
+		     ? regend[regno] : end_match_1);
+	    while (1)
+	      {
+		/* Advance to next segment in register contents, if necessary */
+		while (d2 == dend2)
+		  {
+		    if (dend2 == end_match_2) break;
+		    if (dend2 == regend[regno]) break;
+		    d2 = string2, dend2 = regend[regno];  /* end of string1 => advance to string2. */
+		  }
+		/* At end of register contents => success */
+		if (d2 == dend2) break;
+
+		/* Advance to next segment in data being matched, if necessary */
+		PREFETCH;
+
+		/* mcnt gets # consecutive chars to compare */
+		mcnt = dend - d;
+		if (mcnt > dend2 - d2)
+		  mcnt = dend2 - d2;
+		/* Compare that many; failure if mismatch, else skip them. */
+		if (translate ? bcmp_translate (d, d2, mcnt, translate) : bcmp (d, d2, mcnt))
+		  goto fail;
+		d += mcnt, d2 += mcnt;
+	      }
+	  }
+	  break;
+
+	case anychar:
+	  /* fetch a data character */
+	  PREFETCH;
+	  /* Match anything but a newline.  */
+	  if ((translate ? translate[*d++] : *d++) == '\n')
+	    goto fail;
+	  break;
+
+	case charset:
+	case charset_not:
+	  {
+	    /* Nonzero for charset_not */
+	    int not = 0;
+	    register int c;
+	    if (*(p - 1) == (unsigned char) charset_not)
+	      not = 1;
+
+	    /* fetch a data character */
+	    PREFETCH;
+
+	    if (translate)
+	      c = translate [*d];
+	    else
+	      c = *d;
+
+	    if (c < *p * BYTEWIDTH
+		&& p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
+	      not = !not;
+
+	    p += 1 + *p;
+
+	    if (!not) goto fail;
+	    d++;
+	    break;
+	  }
+
+	case begline:
+	  if (d == string1 || d[-1] == '\n')
+	    break;
+	  goto fail;
+
+	case endline:
+	  if (d == end2
+	      || (d == end1 ? (size2 == 0 || *string2 == '\n') : *d == '\n'))
+	    break;
+	  goto fail;
+
+	/* "or" constructs ("|") are handled by starting each alternative
+	    with an on_failure_jump that points to the start of the next alternative.
+	    Each alternative except the last ends with a jump to the joining point.
+	    (Actually, each jump except for the last one really jumps
+	     to the following jump, because tensioning the jumps is a hassle.) */
+
+	/* The start of a stupid repeat has an on_failure_jump that points
+	   past the end of the repeat text.
+	   This makes a failure point so that, on failure to match a repetition,
+	   matching restarts past as many repetitions have been found
+	   with no way to fail and look for another one.  */
+
+	/* A smart repeat is similar but loops back to the on_failure_jump
+	   so that each repetition makes another failure point. */
+
+	case on_failure_jump:
+	  if (stackp == stacke)
+	    {
+	      unsigned char **stackx;
+	      if (stacke - stackb > re_max_failures * 2)
+		return -2;
+	      stackx = (unsigned char **) alloca (2 * (stacke - stackb)
+					 * sizeof (char *));
+	      bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
+	      stackp = stackx + (stackp - stackb);
+	      stacke = stackx + 2 * (stacke - stackb);
+	      stackb = stackx;
+	    }
+	  mcnt = *p++ & 0377;
+	  mcnt += SIGN_EXTEND_CHAR (*(char *)p) << 8;
+	  p++;
+	  *stackp++ = mcnt + p;
+	  *stackp++ = d;
+	  break;
+
+	/* The end of a smart repeat has an maybe_finalize_jump back.
+	   Change it either to a finalize_jump or an ordinary jump. */
+
+	case maybe_finalize_jump:
+	  mcnt = *p++ & 0377;
+	  mcnt += SIGN_EXTEND_CHAR (*(char *)p) << 8;
+	  p++;
+	  {
+	    register unsigned char *p2 = p;
+	    /* Compare what follows with the begining of the repeat.
+	       If we can establish that there is nothing that they would
+	       both match, we can change to finalize_jump */
+	    while (p2 != pend
+		   && (*p2 == (unsigned char) stop_memory
+		       || *p2 == (unsigned char) start_memory))
+	      p2++;
+	    if (p2 == pend)
+	      p[-3] = (unsigned char) finalize_jump;
+	    else if (*p2 == (unsigned char) exactn
+		     || *p2 == (unsigned char) endline)
+	      {
+		register int c = *p2 == (unsigned char) endline ? '\n' : p2[2];
+		register unsigned char *p1 = p + mcnt;
+		/* p1[0] ... p1[2] are an on_failure_jump.
+		   Examine what follows that */
+		if (p1[3] == (unsigned char) exactn && p1[5] != c)
+		  p[-3] = (unsigned char) finalize_jump;
+		else if (p1[3] == (unsigned char) charset
+			 || p1[3] == (unsigned char) charset_not)
+		  {
+		    int not = p1[3] == (unsigned char) charset_not;
+		    if (c < p1[4] * BYTEWIDTH
+			&& p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
+		      not = !not;
+		    /* not is 1 if c would match */
+		    /* That means it is not safe to finalize */
+		    if (!not)
+		      p[-3] = (unsigned char) finalize_jump;
+		  }
+	      }
+	  }
+	  p -= 2;
+	  if (p[-1] != (unsigned char) finalize_jump)
+	    {
+	      p[-1] = (unsigned char) jump;
+	      goto nofinalize;
+	    }
+
+	/* The end of a stupid repeat has a finalize-jump
+	   back to the start, where another failure point will be made
+	   which will point after all the repetitions found so far. */
+
+	case finalize_jump:
+	  stackp -= 2;
+
+	case jump:
+	nofinalize:
+	  mcnt = *p++ & 0377;
+	  mcnt += SIGN_EXTEND_CHAR (*(char *)p) << 8;
+	  p += mcnt + 1;	/* The 1 compensates for missing ++ above */
+	  break;
+
+	case dummy_failure_jump:
+	  if (stackp == stacke)
+	    {
+	      unsigned char **stackx
+		= (unsigned char **) alloca (2 * (stacke - stackb)
+					     * sizeof (char *));
+	      bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
+	      stackp = stackx + (stackp - stackb);
+	      stacke = stackx + 2 * (stacke - stackb);
+	      stackb = stackx;
+	    }
+	  *stackp++ = 0;
+	  *stackp++ = 0;
+	  goto nofinalize;
+
+	case wordbound:
+	  if (d == string1  /* Points to first char */
+	      || d == end2  /* Points to end */
+	      || (d == end1 && size2 == 0)) /* Points to end */
+	    break;
+	  if ((SYNTAX (d[-1]) == Sword)
+	      != (SYNTAX (d == end1 ? *string2 : *d) == Sword))
+	    break;
+	  goto fail;
+
+	case notwordbound:
+	  if (d == string1  /* Points to first char */
+	      || d == end2  /* Points to end */
+	      || (d == end1 && size2 == 0)) /* Points to end */
+	    goto fail;
+	  if ((SYNTAX (d[-1]) == Sword)
+	      != (SYNTAX (d == end1 ? *string2 : *d) == Sword))
+	    goto fail;
+	  break;
+
+	case wordbeg:
+	  if (d == end2  /* Points to end */
+	      || (d == end1 && size2 == 0) /* Points to end */
+	      || SYNTAX (* (d == end1 ? string2 : d)) != Sword) /* Next char not a letter */
+	    goto fail;
+	  if (d == string1  /* Points to first char */
+	      || SYNTAX (d[-1]) != Sword)  /* prev char not letter */
+	    break;
+	  goto fail;
+
+	case wordend:
+	  if (d == string1  /* Points to first char */
+	      || SYNTAX (d[-1]) != Sword)  /* prev char not letter */
+	    goto fail;
+	  if (d == end2  /* Points to end */
+	      || (d == end1 && size2 == 0) /* Points to end */
+	      || SYNTAX (d == end1 ? *string2 : *d) != Sword) /* Next char not a letter */
+	    break;
+	  goto fail;
+
+#ifdef emacs
+	case before_dot:
+	  if (((d - string2 <= (unsigned) size2)
+	       ? d - bf_p2 : d - bf_p1)
+	      <= point)
+	    goto fail;
+	  break;
+
+	case at_dot:
+	  if (((d - string2 <= (unsigned) size2)
+	       ? d - bf_p2 : d - bf_p1)
+	      == point)
+	    goto fail;
+	  break;
+
+	case after_dot:
+	  if (((d - string2 <= (unsigned) size2)
+	       ? d - bf_p2 : d - bf_p1)
+	      >= point)
+	    goto fail;
+	  break;
+
+	case wordchar:
+	  mcnt = (int) Sword;
+	  goto matchsyntax;
+
+	case syntaxspec:
+	  mcnt = *p++;
+	matchsyntax:
+	  PREFETCH;
+	  if (SYNTAX (*d++) != (enum syntaxcode) mcnt) goto fail;
+	  break;
+	  
+	case notwordchar:
+	  mcnt = (int) Sword;
+	  goto matchnotsyntax;
+
+	case notsyntaxspec:
+	  mcnt = *p++;
+	matchnotsyntax:
+	  PREFETCH;
+	  if (SYNTAX (*d++) == (enum syntaxcode) mcnt) goto fail;
+	  break;
+#else
+	case wordchar:
+	  PREFETCH;
+	  if (SYNTAX (*d++) == 0) goto fail;
+	  break;
+	  
+	case notwordchar:
+	  PREFETCH;
+	  if (SYNTAX (*d++) != 0) goto fail;
+	  break;
+#endif /* not emacs */
+
+	case begbuf:
+	  if (d == string1)	/* Note, d cannot equal string2 */
+	    break;		/* unless string1 == string2.  */
+	  goto fail;
+
+	case endbuf:
+	  if (d == end2 || (d == end1 && size2 == 0))
+	    break;
+	  goto fail;
+
+	case exactn:
+	  /* Match the next few pattern characters exactly.
+	     mcnt is how many characters to match. */
+	  mcnt = *p++;
+	  if (translate)
+	    {
+	      do
+		{
+		  PREFETCH;
+		  if (translate[*d++] != *p++) goto fail;
+		}
+	      while (--mcnt);
+	    }
+	  else
+	    {
+	      do
+		{
+		  PREFETCH;
+		  if (*d++ != *p++) goto fail;
+		}
+	      while (--mcnt);
+	    }
+	  break;
+
+	default:
+	  break;
+	}
+      continue;    /* Successfully matched one pattern command; keep matching */
+
+      /* Jump here if any matching operation fails. */
+    fail:
+      if (stackp != stackb)
+	/* A restart point is known.  Restart there and pop it. */
+	{
+	  if (!stackp[-2])
+	    {   /* If innermost failure point is dormant, flush it and keep looking */
+	      stackp -= 2;
+	      goto fail;
+	    }
+	  d = *--stackp;
+	  p = *--stackp;
+	  if (d >= string1 && d <= end1)
+	    dend = end_match_1;
+	}
+      else break;   /* Matching at this starting point really fails! */
+    }
+  return -1;         /* Failure to match */
+}
+
+static int
+bcmp_translate (s1, s2, len, translate)
+     unsigned char *s1, *s2;
+     register int len;
+     unsigned char *translate;
+{
+  register unsigned char *p1 = s1, *p2 = s2;
+  while (len)
+    {
+      if (translate [*p1++] != translate [*p2++]) return 1;
+      len--;
+    }
+  return 0;
+}
+
+/* Entry points compatible with bsd4.2 regex library */
+
+#ifndef emacs
+
+static struct re_pattern_buffer re_comp_buf;
+
+char *
+re_comp (s)
+     char *s;
+{
+  if (!s)
+    {
+      if (!re_comp_buf.buffer)
+	return "No previous regular expression";
+      return 0;
+    }
+
+  if (!re_comp_buf.buffer)
+    {
+      if (!(re_comp_buf.buffer = (char *) malloc (200)))
+	return "Memory exhausted";
+      re_comp_buf.allocated = 200;
+      if (!(re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH)))
+	return "Memory exhausted";
+    }
+  return re_compile_pattern (s, strlen (s), &re_comp_buf);
+}
+
+int
+re_exec (s)
+     char *s;
+{
+  int len = strlen (s);
+  return 0 <= re_search (&re_comp_buf, s, len, 0, len, 0);
+}
+
+#endif /* emacs */
+
+#ifdef test
+
+#include <stdio.h>
+
+/* Indexed by a character, gives the upper case equivalent of the character */
+
+static char upcase[0400] = 
+  { 000, 001, 002, 003, 004, 005, 006, 007,
+    010, 011, 012, 013, 014, 015, 016, 017,
+    020, 021, 022, 023, 024, 025, 026, 027,
+    030, 031, 032, 033, 034, 035, 036, 037,
+    040, 041, 042, 043, 044, 045, 046, 047,
+    050, 051, 052, 053, 054, 055, 056, 057,
+    060, 061, 062, 063, 064, 065, 066, 067,
+    070, 071, 072, 073, 074, 075, 076, 077,
+    0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
+    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
+    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
+    0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
+    0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
+    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
+    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
+    0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177,
+    0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
+    0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
+    0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
+    0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
+    0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
+    0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
+    0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
+    0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
+    0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307,
+    0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317,
+    0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327,
+    0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337,
+    0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
+    0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
+    0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
+    0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377
+  };
+
+main (argc, argv)
+     int argc;
+     char **argv;
+{
+  char pat[80];
+  struct re_pattern_buffer buf;
+  int i;
+  char c;
+  char fastmap[(1 << BYTEWIDTH)];
+
+  /* Allow a command argument to specify the style of syntax.  */
+  if (argc > 1)
+    obscure_syntax = atoi (argv[1]);
+
+  buf.allocated = 40;
+  buf.buffer = (char *) malloc (buf.allocated);
+  buf.fastmap = fastmap;
+  buf.translate = upcase;
+
+  while (1)
+    {
+      gets (pat);
+
+      if (*pat)
+	{
+          re_compile_pattern (pat, strlen(pat), &buf);
+
+	  for (i = 0; i < buf.used; i++)
+	    printchar (buf.buffer[i]);
+
+	  putchar ('\n');
+
+	  printf ("%d allocated, %d used.\n", buf.allocated, buf.used);
+
+	  re_compile_fastmap (&buf);
+	  printf ("Allowed by fastmap: ");
+	  for (i = 0; i < (1 << BYTEWIDTH); i++)
+	    if (fastmap[i]) printchar (i);
+	  putchar ('\n');
+	}
+
+      gets (pat);	/* Now read the string to match against */
+
+      i = re_match (&buf, pat, strlen (pat), 0, 0);
+      printf ("Match value %d.\n", i);
+    }
+}
+
+#ifdef NOTDEF
+print_buf (bufp)
+     struct re_pattern_buffer *bufp;
+{
+  int i;
+
+  printf ("buf is :\n----------------\n");
+  for (i = 0; i < bufp->used; i++)
+    printchar (bufp->buffer[i]);
+  
+  printf ("\n%d allocated, %d used.\n", bufp->allocated, bufp->used);
+  
+  printf ("Allowed by fastmap: ");
+  for (i = 0; i < (1 << BYTEWIDTH); i++)
+    if (bufp->fastmap[i])
+      printchar (i);
+  printf ("\nAllowed by translate: ");
+  if (bufp->translate)
+    for (i = 0; i < (1 << BYTEWIDTH); i++)
+      if (bufp->translate[i])
+	printchar (i);
+  printf ("\nfastmap is%s accurate\n", bufp->fastmap_accurate ? "" : "n't");
+  printf ("can %s be null\n----------", bufp->can_be_null ? "" : "not");
+}
+#endif
+
+printchar (c)
+     char c;
+{
+  if (c < 041 || c >= 0177)
+    {
+      putchar ('\\');
+      putchar (((c >> 6) & 3) + '0');
+      putchar (((c >> 3) & 7) + '0');
+      putchar ((c & 7) + '0');
+    }
+  else
+    putchar (c);
+}
+
+error (string)
+     char *string;
+{
+  puts (string);
+  exit (1);
+}
+
+#endif /* test */
diff --git a/gnu/usr.bin/grep/regex.h b/gnu/usr.bin/grep/regex.h
new file mode 100644
index 000000000000..a5e89d32ccb3
--- /dev/null
+++ b/gnu/usr.bin/grep/regex.h
@@ -0,0 +1,185 @@
+/* Definitions for data structures callers pass the regex library.
+   Copyright (C) 1985, 1989 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+
+   In other words, you are welcome to use, share and improve this program.
+   You are forbidden to forbid anyone else to use, share and improve
+   what you give them.   Help stamp out software-hoarding!  */
+
+
+/* Define number of parens for which we record the beginnings and ends.
+   This affects how much space the `struct re_registers' type takes up.  */
+#ifndef RE_NREGS
+#define RE_NREGS 10
+#endif
+
+/* These bits are used in the obscure_syntax variable to choose among
+   alternative regexp syntaxes.  */
+
+/* 1 means plain parentheses serve as grouping, and backslash
+     parentheses are needed for literal searching.
+   0 means backslash-parentheses are grouping, and plain parentheses
+     are for literal searching.  */
+#define RE_NO_BK_PARENS 1
+
+/* 1 means plain | serves as the "or"-operator, and \| is a literal.
+   0 means \| serves as the "or"-operator, and | is a literal.  */
+#define RE_NO_BK_VBAR 2
+
+/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
+   1 means \+, \? are operators and plain +, ? are literals.  */
+#define RE_BK_PLUS_QM 4
+
+/* 1 means | binds tighter than ^ or $.
+   0 means the contrary.  */
+#define RE_TIGHT_VBAR 8
+
+/* 1 means treat \n as an _OR operator
+   0 means treat it as a normal character */
+#define RE_NEWLINE_OR 16
+
+/* 0 means that a special characters (such as *, ^, and $) always have
+     their special meaning regardless of the surrounding context.
+   1 means that special characters may act as normal characters in some
+     contexts.  Specifically, this applies to:
+	^ - only special at the beginning, or after ( or |
+	$ - only special at the end, or before ) or |
+	*, +, ? - only special when not after the beginning, (, or | */
+#define RE_CONTEXT_INDEP_OPS 32
+
+/* Now define combinations of bits for the standard possibilities.  */
+#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
+#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
+#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
+#define RE_SYNTAX_EMACS 0
+
+/* This data structure is used to represent a compiled pattern. */
+
+struct re_pattern_buffer
+  {
+    char *buffer;	/* Space holding the compiled pattern commands. */
+    int allocated;	/* Size of space that  buffer  points to */
+    int used;		/* Length of portion of buffer actually occupied */
+    char *fastmap;	/* Pointer to fastmap, if any, or zero if none. */
+			/* re_search uses the fastmap, if there is one,
+			   to skip quickly over totally implausible characters */
+    char *translate;	/* Translate table to apply to all characters before comparing.
+			   Or zero for no translation.
+			   The translation is applied to a pattern when it is compiled
+			   and to data when it is matched. */
+    char fastmap_accurate;
+			/* Set to zero when a new pattern is stored,
+			   set to one when the fastmap is updated from it. */
+    char can_be_null;   /* Set to one by compiling fastmap
+			   if this pattern might match the null string.
+			   It does not necessarily match the null string
+			   in that case, but if this is zero, it cannot.
+			   2 as value means can match null string
+			   but at end of range or before a character
+			   listed in the fastmap.  */
+  };
+
+/* Structure to store "register" contents data in.
+
+   Pass the address of such a structure as an argument to re_match, etc.,
+   if you want this information back.
+
+   start[i] and end[i] record the string matched by \( ... \) grouping i,
+   for i from 1 to RE_NREGS - 1.
+   start[0] and end[0] record the entire string matched. */
+
+struct re_registers
+  {
+    int start[RE_NREGS];
+    int end[RE_NREGS];
+  };
+
+/* These are the command codes that appear in compiled regular expressions, one per byte.
+  Some command codes are followed by argument bytes.
+  A command code can specify any interpretation whatever for its arguments.
+  Zero-bytes may appear in the compiled regular expression. */
+
+enum regexpcode
+  {
+    unused,
+    exactn,    /* followed by one byte giving n, and then by n literal bytes */
+    begline,   /* fails unless at beginning of line */
+    endline,   /* fails unless at end of line */
+    jump,	 /* followed by two bytes giving relative address to jump to */
+    on_failure_jump,	 /* followed by two bytes giving relative address of place
+		            to resume at in case of failure. */
+    finalize_jump,	 /* Throw away latest failure point and then jump to address. */
+    maybe_finalize_jump, /* Like jump but finalize if safe to do so.
+			    This is used to jump back to the beginning
+			    of a repeat.  If the command that follows
+			    this jump is clearly incompatible with the
+			    one at the beginning of the repeat, such that
+			    we can be sure that there is no use backtracking
+			    out of repetitions already completed,
+			    then we finalize. */
+    dummy_failure_jump,  /* jump, and push a dummy failure point.
+			    This failure point will be thrown away
+			    if an attempt is made to use it for a failure.
+			    A + construct makes this before the first repeat.  */
+    anychar,	 /* matches any one character */
+    charset,     /* matches any one char belonging to specified set.
+		    First following byte is # bitmap bytes.
+		    Then come bytes for a bit-map saying which chars are in.
+		    Bits in each byte are ordered low-bit-first.
+		    A character is in the set if its bit is 1.
+		    A character too large to have a bit in the map
+		    is automatically not in the set */
+    charset_not, /* similar but match any character that is NOT one of those specified */
+    start_memory, /* starts remembering the text that is matched
+		    and stores it in a memory register.
+		    followed by one byte containing the register number.
+		    Register numbers must be in the range 0 through NREGS. */
+    stop_memory, /* stops remembering the text that is matched
+		    and stores it in a memory register.
+		    followed by one byte containing the register number.
+		    Register numbers must be in the range 0 through NREGS. */
+    duplicate,    /* match a duplicate of something remembered.
+		    Followed by one byte containing the index of the memory register. */
+    before_dot,	 /* Succeeds if before dot */
+    at_dot,	 /* Succeeds if at dot */
+    after_dot,	 /* Succeeds if after dot */
+    begbuf,      /* Succeeds if at beginning of buffer */
+    endbuf,      /* Succeeds if at end of buffer */
+    wordchar,    /* Matches any word-constituent character */
+    notwordchar, /* Matches any char that is not a word-constituent */
+    wordbeg,	 /* Succeeds if at word beginning */
+    wordend,	 /* Succeeds if at word end */
+    wordbound,   /* Succeeds if at a word boundary */
+    notwordbound, /* Succeeds if not at a word boundary */
+    syntaxspec,  /* Matches any character whose syntax is specified.
+		    followed by a byte which contains a syntax code, Sword or such like */
+    notsyntaxspec /* Matches any character whose syntax differs from the specified. */
+  };
+
+extern char *re_compile_pattern ();
+/* Is this really advertised? */
+extern void re_compile_fastmap ();
+extern int re_search (), re_search_2 ();
+extern int re_match (), re_match_2 ();
+
+/* 4.2 bsd compatibility (yuck) */
+extern char *re_comp ();
+extern int re_exec ();
+
+#ifdef SYNTAX_TABLE
+extern char *re_syntax_table;
+#endif
diff --git a/gnu/usr.bin/grep/tests/khadafy.lines b/gnu/usr.bin/grep/tests/khadafy.lines
new file mode 100644
index 000000000000..57e21a17948d
--- /dev/null
+++ b/gnu/usr.bin/grep/tests/khadafy.lines
@@ -0,0 +1,32 @@
+1)  Muammar Qaddafi
+2)  Mo'ammar Gadhafi
+3)  Muammar Kaddafi
+4)  Muammar Qadhafi
+5)  Moammar El Kadhafi
+6)  Muammar Gadafi
+7)  Mu'ammar al-Qadafi
+8)  Moamer El Kazzafi
+9)  Moamar al-Gaddafi
+10) Mu'ammar Al Qathafi
+11) Muammar Al Qathafi
+12) Mo'ammar el-Gadhafi
+13) Moamar El Kadhafi
+14) Muammar al-Qadhafi
+15) Mu'ammar al-Qadhdhafi
+16) Mu'ammar Qadafi
+17) Moamar Gaddafi
+18) Mu'ammar Qadhdhafi
+19) Muammar Khaddafi
+20) Muammar al-Khaddafi
+21) Mu'amar al-Kadafi
+22) Muammar Ghaddafy
+23) Muammar Ghadafi
+24) Muammar Ghaddafi
+25) Muamar Kaddafi
+26) Muammar Quathafi
+27) Muammar Gheddafi
+28) Muamar Al-Kaddafi
+29) Moammar Khadafy
+30) Moammar Qudhafi
+31) Mu'ammar al-Qaddafi
+32) Mulazim Awwal Mu'ammar Muhammad Abu Minyar al-Qadhafi
diff --git a/gnu/usr.bin/grep/tests/khadafy.regexp b/gnu/usr.bin/grep/tests/khadafy.regexp
new file mode 100644
index 000000000000..46fe8ddd6dad
--- /dev/null
+++ b/gnu/usr.bin/grep/tests/khadafy.regexp
@@ -0,0 +1 @@
+M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]
diff --git a/gnu/usr.bin/grep/tests/regress.sh b/gnu/usr.bin/grep/tests/regress.sh
new file mode 100644
index 000000000000..b947036467e0
--- /dev/null
+++ b/gnu/usr.bin/grep/tests/regress.sh
@@ -0,0 +1,30 @@
+#! /bin/sh
+# Regression test for GNU e?grep.
+# Usage: regress.sh [dir-containing-egrep]
+
+builddir=${1-..}
+
+failures=0
+
+# The Khadafy test is brought to you by Scott Anderson . . .
+$builddir/egrep -f khadafy.regexp khadafy.lines > khadafy.out
+if cmp khadafy.lines khadafy.out
+then
+	rm khadafy.out
+else
+	echo Khadafy test failed -- output left on khadafy.out
+	failures=1
+fi
+
+# . . . and the following by Henry Spencer.
+
+awk -F: -f scriptgen.awk spencer.tests > tmp.script
+
+if sh tmp.script $builddir
+then
+	rm tmp.script
+	exit $failures
+else
+	rm tmp.script
+	exit 1
+fi
diff --git a/gnu/usr.bin/grep/tests/scriptgen.awk b/gnu/usr.bin/grep/tests/scriptgen.awk
new file mode 100644
index 000000000000..cdadc917aee4
--- /dev/null
+++ b/gnu/usr.bin/grep/tests/scriptgen.awk
@@ -0,0 +1,10 @@
+BEGIN { print "failures=0"; }
+!/^#/ && NF == 3 {
+	print "echo '" $3 "' | $1/egrep -e '" $2 "' > /dev/null 2>&1";
+	print "if [ $? != " $1 " ]"
+	print "then"
+	printf "\techo Spencer test \\#%d failed\n", ++n
+	print "\tfailures=1"
+	print "fi"
+}
+END { print "exit $failures"; }
diff --git a/gnu/usr.bin/grep/tests/spencer.tests b/gnu/usr.bin/grep/tests/spencer.tests
new file mode 100644
index 000000000000..c7eee9bdde83
--- /dev/null
+++ b/gnu/usr.bin/grep/tests/spencer.tests
@@ -0,0 +1,122 @@
+0:abc:abc
+1:abc:xbc
+1:abc:axc
+1:abc:abx
+0:abc:xabcy
+0:abc:ababc
+0:ab*c:abc
+0:ab*bc:abc
+0:ab*bc:abbc
+0:ab*bc:abbbbc
+0:ab+bc:abbc
+1:ab+bc:abc
+1:ab+bc:abq
+0:ab+bc:abbbbc
+0:ab?bc:abbc
+0:ab?bc:abc
+1:ab?bc:abbbbc
+0:ab?c:abc
+0:^abc$:abc
+1:^abc$:abcc
+0:^abc:abcc
+1:^abc$:aabc
+0:abc$:aabc
+0:^:abc
+0:$:abc
+0:a.c:abc
+0:a.c:axc
+0:a.*c:axyzc
+1:a.*c:axyzd
+1:a[bc]d:abc
+0:a[bc]d:abd
+1:a[b-d]e:abd
+0:a[b-d]e:ace
+0:a[b-d]:aac
+0:a[-b]:a-
+2:a[b-]:a-
+1:a[b-a]:-
+2:a[]b:-
+2:a[:-
+0:a]:a]
+0:a[]]b:a]b
+0:a[^bc]d:aed
+1:a[^bc]d:abd
+0:a[^-b]c:adc
+1:a[^-b]c:a-c
+1:a[^]b]c:a]c
+0:a[^]b]c:adc
+0:ab|cd:abc
+0:ab|cd:abcd
+0:()ef:def
+0:()*:-
+1:*a:-
+0:^*:-
+0:$*:-
+1:(*)b:-
+1:$b:b
+2:a\:-
+0:a\(b:a(b
+0:a\(*b:ab
+0:a\(*b:a((b
+1:a\x:a\x
+2:abc):-
+2:(abc:-
+0:((a)):abc
+0:(a)b(c):abc
+0:a+b+c:aabbabc
+0:a**:-
+0:a*?:-
+0:(a*)*:-
+0:(a*)+:-
+0:(a|)*:-
+0:(a*|b)*:-
+0:(a+|b)*:ab
+0:(a+|b)+:ab
+0:(a+|b)?:ab
+0:[^ab]*:cde
+0:(^)*:-
+0:(ab|)*:-
+2:)(:-
+1:abc:
+1:abc:
+0:a*:
+0:([abc])*d:abbbcd
+0:([abc])*bcd:abcd
+0:a|b|c|d|e:e
+0:(a|b|c|d|e)f:ef
+0:((a*|b))*:-
+0:abcd*efg:abcdefg
+0:ab*:xabyabbbz
+0:ab*:xayabbbz
+0:(ab|cd)e:abcde
+0:[abhgefdc]ij:hij
+1:^(ab|cd)e:abcde
+0:(abc|)ef:abcdef
+0:(a|b)c*d:abcd
+0:(ab|ab*)bc:abc
+0:a([bc]*)c*:abc
+0:a([bc]*)(c*d):abcd
+0:a([bc]+)(c*d):abcd
+0:a([bc]*)(c+d):abcd
+0:a[bcd]*dcdcde:adcdcde
+1:a[bcd]+dcdcde:adcdcde
+0:(ab|a)b*c:abc
+0:((a)(b)c)(d):abcd
+0:[A-Za-z_][A-Za-z0-9_]*:alpha
+0:^a(bc+|b[eh])g|.h$:abh
+0:(bc+d$|ef*g.|h?i(j|k)):effgz
+0:(bc+d$|ef*g.|h?i(j|k)):ij
+1:(bc+d$|ef*g.|h?i(j|k)):effg
+1:(bc+d$|ef*g.|h?i(j|k)):bcdd
+0:(bc+d$|ef*g.|h?i(j|k)):reffgz
+1:((((((((((a)))))))))):-
+0:(((((((((a))))))))):a
+1:multiple words of text:uh-uh
+0:multiple words:multiple words, yeah
+0:(.*)c(.*):abcde
+1:\((.*),:(.*)\)
+1:[k]:ab
+0:abcd:abcd
+0:a(bc)d:abcd
+0:a[-]?c:ac
+0:(....).*\1:beriberi