2837 lines
116 KiB
Plaintext
2837 lines
116 KiB
Plaintext
This is Info file regex.info, produced by Makeinfo-1.52 from the input
|
||
file .././doc/regex.texi.
|
||
|
||
This file documents the GNU regular expression library.
|
||
|
||
Copyright (C) 1992, 1993 Free Software Foundation, Inc.
|
||
|
||
Permission is granted to make and distribute verbatim copies of this
|
||
manual provided the copyright notice and this permission notice are
|
||
preserved on all copies.
|
||
|
||
Permission is granted to copy and distribute modified versions of this
|
||
manual under the conditions for verbatim copying, provided also that the
|
||
section entitled "GNU General Public License" is included exactly as in
|
||
the original, and provided that the entire resulting derived work is
|
||
distributed under the terms of a permission notice identical to this
|
||
one.
|
||
|
||
Permission is granted to copy and distribute translations of this
|
||
manual into another language, under the above conditions for modified
|
||
versions, except that the section entitled "GNU General Public License"
|
||
may be included in a translation approved by the Free Software
|
||
Foundation instead of in the original English.
|
||
|
||
|
||
File: regex.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
|
||
|
||
Regular Expression Library
|
||
**************************
|
||
|
||
This manual documents how to program with the GNU regular expression
|
||
library. This is edition 0.12a of the manual, 19 September 1992.
|
||
|
||
The first part of this master menu lists the major nodes in this Info
|
||
document, including the index. The rest of the menu lists all the
|
||
lower level nodes in the document.
|
||
|
||
* Menu:
|
||
|
||
* Overview::
|
||
* Regular Expression Syntax::
|
||
* Common Operators::
|
||
* GNU Operators::
|
||
* GNU Emacs Operators::
|
||
* What Gets Matched?::
|
||
* Programming with Regex::
|
||
* Copying:: Copying and sharing Regex.
|
||
* Index:: General index.
|
||
-- The Detailed Node Listing --
|
||
|
||
Regular Expression Syntax
|
||
|
||
* Syntax Bits::
|
||
* Predefined Syntaxes::
|
||
* Collating Elements vs. Characters::
|
||
* The Backslash Character::
|
||
|
||
Common Operators
|
||
|
||
* Match-self Operator:: Ordinary characters.
|
||
* Match-any-character Operator:: .
|
||
* Concatenation Operator:: Juxtaposition.
|
||
* Repetition Operators:: * + ? {}
|
||
* Alternation Operator:: |
|
||
* List Operators:: [...] [^...]
|
||
* Grouping Operators:: (...)
|
||
* Back-reference Operator:: \digit
|
||
* Anchoring Operators:: ^ $
|
||
|
||
Repetition Operators
|
||
|
||
* Match-zero-or-more Operator:: *
|
||
* Match-one-or-more Operator:: +
|
||
* Match-zero-or-one Operator:: ?
|
||
* Interval Operators:: {}
|
||
|
||
List Operators (`[' ... `]' and `[^' ... `]')
|
||
|
||
* Character Class Operators:: [:class:]
|
||
* Range Operator:: start-end
|
||
|
||
Anchoring Operators
|
||
|
||
* Match-beginning-of-line Operator:: ^
|
||
* Match-end-of-line Operator:: $
|
||
|
||
GNU Operators
|
||
|
||
* Word Operators::
|
||
* Buffer Operators::
|
||
|
||
Word Operators
|
||
|
||
* Non-Emacs Syntax Tables::
|
||
* Match-word-boundary Operator:: \b
|
||
* Match-within-word Operator:: \B
|
||
* Match-beginning-of-word Operator:: \<
|
||
* Match-end-of-word Operator:: \>
|
||
* Match-word-constituent Operator:: \w
|
||
* Match-non-word-constituent Operator:: \W
|
||
|
||
Buffer Operators
|
||
|
||
* Match-beginning-of-buffer Operator:: \`
|
||
* Match-end-of-buffer Operator:: \'
|
||
|
||
GNU Emacs Operators
|
||
|
||
* Syntactic Class Operators::
|
||
|
||
Syntactic Class Operators
|
||
|
||
* Emacs Syntax Tables::
|
||
* Match-syntactic-class Operator:: \sCLASS
|
||
* Match-not-syntactic-class Operator:: \SCLASS
|
||
|
||
Programming with Regex
|
||
|
||
* GNU Regex Functions::
|
||
* POSIX Regex Functions::
|
||
* BSD Regex Functions::
|
||
|
||
GNU Regex Functions
|
||
|
||
* GNU Pattern Buffers:: The re_pattern_buffer type.
|
||
* GNU Regular Expression Compiling:: re_compile_pattern ()
|
||
* GNU Matching:: re_match ()
|
||
* GNU Searching:: re_search ()
|
||
* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
|
||
* Searching with Fastmaps:: re_compile_fastmap ()
|
||
* GNU Translate Tables:: The `translate' field.
|
||
* Using Registers:: The re_registers type and related fns.
|
||
* Freeing GNU Pattern Buffers:: regfree ()
|
||
|
||
POSIX Regex Functions
|
||
|
||
* POSIX Pattern Buffers:: The regex_t type.
|
||
* POSIX Regular Expression Compiling:: regcomp ()
|
||
* POSIX Matching:: regexec ()
|
||
* Reporting Errors:: regerror ()
|
||
* Using Byte Offsets:: The regmatch_t type.
|
||
* Freeing POSIX Pattern Buffers:: regfree ()
|
||
|
||
BSD Regex Functions
|
||
|
||
* BSD Regular Expression Compiling:: re_comp ()
|
||
* BSD Searching:: re_exec ()
|
||
|
||
|
||
File: regex.info, Node: Overview, Next: Regular Expression Syntax, Prev: Top, Up: Top
|
||
|
||
Overview
|
||
********
|
||
|
||
A "regular expression" (or "regexp", or "pattern") is a text string
|
||
that describes some (mathematical) set of strings. A regexp R
|
||
"matches" a string S if S is in the set of strings described by R.
|
||
|
||
Using the Regex library, you can:
|
||
|
||
* see if a string matches a specified pattern as a whole, and
|
||
|
||
* search within a string for a substring matching a specified
|
||
pattern.
|
||
|
||
Some regular expressions match only one string, i.e., the set they
|
||
describe has only one member. For example, the regular expression
|
||
`foo' matches the string `foo' and no others. Other regular
|
||
expressions match more than one string, i.e., the set they describe has
|
||
more than one member. For example, the regular expression `f*' matches
|
||
the set of strings made up of any number (including zero) of `f's. As
|
||
you can see, some characters in regular expressions match themselves
|
||
(such as `f') and some don't (such as `*'); the ones that don't match
|
||
themselves instead let you specify patterns that describe many
|
||
different strings.
|
||
|
||
To either match or search for a regular expression with the Regex
|
||
library functions, you must first compile it with a Regex pattern
|
||
compiling function. A "compiled pattern" is a regular expression
|
||
converted to the internal format used by the library functions. Once
|
||
you've compiled a pattern, you can use it for matching or searching any
|
||
number of times.
|
||
|
||
The Regex library consists of two source files: `regex.h' and
|
||
`regex.c'. Regex provides three groups of functions with which you can
|
||
operate on regular expressions. One group--the GNU group--is more
|
||
powerful but not completely compatible with the other two, namely the
|
||
POSIX and Berkeley UNIX groups; its interface was designed specifically
|
||
for GNU. The other groups have the same interfaces as do the regular
|
||
expression functions in POSIX and Berkeley UNIX.
|
||
|
||
We wrote this chapter with programmers in mind, not users of
|
||
programs--such as Emacs--that use Regex. We describe the Regex library
|
||
in its entirety, not how to write regular expressions that a particular
|
||
program understands.
|
||
|
||
|
||
File: regex.info, Node: Regular Expression Syntax, Next: Common Operators, Prev: Overview, Up: Top
|
||
|
||
Regular Expression Syntax
|
||
*************************
|
||
|
||
"Characters" are things you can type. "Operators" are things in a
|
||
regular expression that match one or more characters. You compose
|
||
regular expressions from operators, which in turn you specify using one
|
||
or more characters.
|
||
|
||
Most characters represent what we call the match-self operator, i.e.,
|
||
they match themselves; we call these characters "ordinary". Other
|
||
characters represent either all or parts of fancier operators; e.g.,
|
||
`.' represents what we call the match-any-character operator (which, no
|
||
surprise, matches (almost) any character); we call these characters
|
||
"special". Two different things determine what characters represent
|
||
what operators:
|
||
|
||
1. the regular expression syntax your program has told the Regex
|
||
library to recognize, and
|
||
|
||
2. the context of the character in the regular expression.
|
||
|
||
In the following sections, we describe these things in more detail.
|
||
|
||
* Menu:
|
||
|
||
* Syntax Bits::
|
||
* Predefined Syntaxes::
|
||
* Collating Elements vs. Characters::
|
||
* The Backslash Character::
|
||
|
||
|
||
File: regex.info, Node: Syntax Bits, Next: Predefined Syntaxes, Up: Regular Expression Syntax
|
||
|
||
Syntax Bits
|
||
===========
|
||
|
||
In any particular syntax for regular expressions, some characters are
|
||
always special, others are sometimes special, and others are never
|
||
special. The particular syntax that Regex recognizes for a given
|
||
regular expression depends on the value in the `syntax' field of the
|
||
pattern buffer of that regular expression.
|
||
|
||
You get a pattern buffer by compiling a regular expression. *Note
|
||
GNU Pattern Buffers::, and *Note POSIX Pattern Buffers::, for more
|
||
information on pattern buffers. *Note GNU Regular Expression
|
||
Compiling::, *Note POSIX Regular Expression Compiling::, and *Note BSD
|
||
Regular Expression Compiling::, for more information on compiling.
|
||
|
||
Regex considers the value of the `syntax' field to be a collection of
|
||
bits; we refer to these bits as "syntax bits". In most cases, they
|
||
affect what characters represent what operators. We describe the
|
||
meanings of the operators to which we refer in *Note Common Operators::,
|
||
*Note GNU Operators::, and *Note GNU Emacs Operators::.
|
||
|
||
For reference, here is the complete list of syntax bits, in
|
||
alphabetical order:
|
||
|
||
`RE_BACKSLASH_ESCAPE_IN_LISTS'
|
||
If this bit is set, then `\' inside a list (*note List Operators::.
|
||
quotes (makes ordinary, if it's special) the following character;
|
||
if this bit isn't set, then `\' is an ordinary character inside
|
||
lists. (*Note The Backslash Character::, for what `\' does
|
||
outside of lists.)
|
||
|
||
`RE_BK_PLUS_QM'
|
||
If this bit is set, then `\+' represents the match-one-or-more
|
||
operator and `\?' represents the match-zero-or-more operator; if
|
||
this bit isn't set, then `+' represents the match-one-or-more
|
||
operator and `?' represents the match-zero-or-one operator. This
|
||
bit is irrelevant if `RE_LIMITED_OPS' is set.
|
||
|
||
`RE_CHAR_CLASSES'
|
||
If this bit is set, then you can use character classes in lists;
|
||
if this bit isn't set, then you can't.
|
||
|
||
`RE_CONTEXT_INDEP_ANCHORS'
|
||
If this bit is set, then `^' and `$' are special anywhere outside
|
||
a list; if this bit isn't set, then these characters are special
|
||
only in certain contexts. *Note Match-beginning-of-line
|
||
Operator::, and *Note Match-end-of-line Operator::.
|
||
|
||
`RE_CONTEXT_INDEP_OPS'
|
||
If this bit is set, then certain characters are special anywhere
|
||
outside a list; if this bit isn't set, then those characters are
|
||
special only in some contexts and are ordinary elsewhere.
|
||
Specifically, if this bit isn't set then `*', and (if the syntax
|
||
bit `RE_LIMITED_OPS' isn't set) `+' and `?' (or `\+' and `\?',
|
||
depending on the syntax bit `RE_BK_PLUS_QM') represent repetition
|
||
operators only if they're not first in a regular expression or
|
||
just after an open-group or alternation operator. The same holds
|
||
for `{' (or `\{', depending on the syntax bit `RE_NO_BK_BRACES') if
|
||
it is the beginning of a valid interval and the syntax bit
|
||
`RE_INTERVALS' is set.
|
||
|
||
`RE_CONTEXT_INVALID_OPS'
|
||
If this bit is set, then repetition and alternation operators
|
||
can't be in certain positions within a regular expression.
|
||
Specifically, the regular expression is invalid if it has:
|
||
|
||
* a repetition operator first in the regular expression or just
|
||
after a match-beginning-of-line, open-group, or alternation
|
||
operator; or
|
||
|
||
* an alternation operator first or last in the regular
|
||
expression, just before a match-end-of-line operator, or just
|
||
after an alternation or open-group operator.
|
||
|
||
If this bit isn't set, then you can put the characters
|
||
representing the repetition and alternation characters anywhere in
|
||
a regular expression. Whether or not they will in fact be
|
||
operators in certain positions depends on other syntax bits.
|
||
|
||
`RE_DOT_NEWLINE'
|
||
If this bit is set, then the match-any-character operator matches
|
||
a newline; if this bit isn't set, then it doesn't.
|
||
|
||
`RE_DOT_NOT_NULL'
|
||
If this bit is set, then the match-any-character operator doesn't
|
||
match a null character; if this bit isn't set, then it does.
|
||
|
||
`RE_INTERVALS'
|
||
If this bit is set, then Regex recognizes interval operators; if
|
||
this bit isn't set, then it doesn't.
|
||
|
||
`RE_LIMITED_OPS'
|
||
If this bit is set, then Regex doesn't recognize the
|
||
match-one-or-more, match-zero-or-one or alternation operators; if
|
||
this bit isn't set, then it does.
|
||
|
||
`RE_NEWLINE_ALT'
|
||
If this bit is set, then newline represents the alternation
|
||
operator; if this bit isn't set, then newline is ordinary.
|
||
|
||
`RE_NO_BK_BRACES'
|
||
If this bit is set, then `{' represents the open-interval operator
|
||
and `}' represents the close-interval operator; if this bit isn't
|
||
set, then `\{' represents the open-interval operator and `\}'
|
||
represents the close-interval operator. This bit is relevant only
|
||
if `RE_INTERVALS' is set.
|
||
|
||
`RE_NO_BK_PARENS'
|
||
If this bit is set, then `(' represents the open-group operator and
|
||
`)' represents the close-group operator; if this bit isn't set,
|
||
then `\(' represents the open-group operator and `\)' represents
|
||
the close-group operator.
|
||
|
||
`RE_NO_BK_REFS'
|
||
If this bit is set, then Regex doesn't recognize `\'DIGIT as the
|
||
back reference operator; if this bit isn't set, then it does.
|
||
|
||
`RE_NO_BK_VBAR'
|
||
If this bit is set, then `|' represents the alternation operator;
|
||
if this bit isn't set, then `\|' represents the alternation
|
||
operator. This bit is irrelevant if `RE_LIMITED_OPS' is set.
|
||
|
||
`RE_NO_EMPTY_RANGES'
|
||
If this bit is set, then a regular expression with a range whose
|
||
ending point collates lower than its starting point is invalid; if
|
||
this bit isn't set, then Regex considers such a range to be empty.
|
||
|
||
`RE_UNMATCHED_RIGHT_PAREN_ORD'
|
||
If this bit is set and the regular expression has no matching
|
||
open-group operator, then Regex considers what would otherwise be
|
||
a close-group operator (based on how `RE_NO_BK_PARENS' is set) to
|
||
match `)'.
|
||
|
||
|
||
File: regex.info, Node: Predefined Syntaxes, Next: Collating Elements vs. Characters, Prev: Syntax Bits, Up: Regular Expression Syntax
|
||
|
||
Predefined Syntaxes
|
||
===================
|
||
|
||
If you're programming with Regex, you can set a pattern buffer's
|
||
(*note GNU Pattern Buffers::., and *Note POSIX Pattern Buffers::)
|
||
`syntax' field either to an arbitrary combination of syntax bits (*note
|
||
Syntax Bits::.) or else to the configurations defined by Regex. These
|
||
configurations define the syntaxes used by certain programs--GNU Emacs,
|
||
POSIX Awk, traditional Awk, Grep, Egrep--in addition to syntaxes for
|
||
POSIX basic and extended regular expressions.
|
||
|
||
The predefined syntaxes-taken directly from `regex.h'--are:
|
||
|
||
#define RE_SYNTAX_EMACS 0
|
||
|
||
#define RE_SYNTAX_AWK \
|
||
(RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
|
||
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
|
||
| RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
|
||
| RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||
|
||
#define RE_SYNTAX_POSIX_AWK \
|
||
(RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
|
||
|
||
#define RE_SYNTAX_GREP \
|
||
(RE_BK_PLUS_QM | RE_CHAR_CLASSES \
|
||
| RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
|
||
| RE_NEWLINE_ALT)
|
||
|
||
#define RE_SYNTAX_EGREP \
|
||
(RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
|
||
| RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
|
||
| RE_NEWLINE_ALT | RE_NO_BK_PARENS \
|
||
| RE_NO_BK_VBAR)
|
||
|
||
#define RE_SYNTAX_POSIX_EGREP \
|
||
(RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
|
||
|
||
/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
|
||
#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
|
||
|
||
#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
|
||
|
||
/* Syntax bits common to both basic and extended POSIX regex syntax. */
|
||
#define _RE_SYNTAX_POSIX_COMMON \
|
||
(RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
|
||
| RE_INTERVALS | RE_NO_EMPTY_RANGES)
|
||
|
||
#define RE_SYNTAX_POSIX_BASIC \
|
||
(_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
|
||
|
||
/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
|
||
RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
|
||
isn't minimal, since other operators, such as \`, aren't disabled. */
|
||
#define RE_SYNTAX_POSIX_MINIMAL_BASIC \
|
||
(_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
|
||
|
||
#define RE_SYNTAX_POSIX_EXTENDED \
|
||
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
|
||
| RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
|
||
| RE_NO_BK_PARENS | RE_NO_BK_VBAR \
|
||
| RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||
|
||
/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
|
||
replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
|
||
#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
|
||
(_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
|
||
| RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
|
||
| RE_NO_BK_PARENS | RE_NO_BK_REFS \
|
||
| RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
|
||
|
||
|
||
File: regex.info, Node: Collating Elements vs. Characters, Next: The Backslash Character, Prev: Predefined Syntaxes, Up: Regular Expression Syntax
|
||
|
||
Collating Elements vs. Characters
|
||
=================================
|
||
|
||
POSIX generalizes the notion of a character to that of a collating
|
||
element. It defines a "collating element" to be "a sequence of one or
|
||
more bytes defined in the current collating sequence as a unit of
|
||
collation."
|
||
|
||
This generalizes the notion of a character in two ways. First, a
|
||
single character can map into two or more collating elements. For
|
||
example, the German "es-zet" collates as the collating element `s'
|
||
followed by another collating element `s'. Second, two or more
|
||
characters can map into one collating element. For example, the
|
||
Spanish `ll' collates after `l' and before `m'.
|
||
|
||
Since POSIX's "collating element" preserves the essential idea of a
|
||
"character," we use the latter, more familiar, term in this document.
|
||
|
||
|
||
File: regex.info, Node: The Backslash Character, Prev: Collating Elements vs. Characters, Up: Regular Expression Syntax
|
||
|
||
The Backslash Character
|
||
=======================
|
||
|
||
The `\' character has one of four different meanings, depending on
|
||
the context in which you use it and what syntax bits are set (*note
|
||
Syntax Bits::.). It can: 1) stand for itself, 2) quote the next
|
||
character, 3) introduce an operator, or 4) do nothing.
|
||
|
||
1. It stands for itself inside a list (*note List Operators::.) if
|
||
the syntax bit `RE_BACKSLASH_ESCAPE_IN_LISTS' is not set. For
|
||
example, `[\]' would match `\'.
|
||
|
||
2. It quotes (makes ordinary, if it's special) the next character
|
||
when you use it either:
|
||
|
||
* outside a list,(1) or
|
||
|
||
* inside a list and the syntax bit
|
||
`RE_BACKSLASH_ESCAPE_IN_LISTS' is set.
|
||
|
||
3. It introduces an operator when followed by certain ordinary
|
||
characters--sometimes only when certain syntax bits are set. See
|
||
the cases `RE_BK_PLUS_QM', `RE_NO_BK_BRACES', `RE_NO_BK_VAR',
|
||
`RE_NO_BK_PARENS', `RE_NO_BK_REF' in *Note Syntax Bits::. Also:
|
||
|
||
* `\b' represents the match-word-boundary operator (*note
|
||
Match-word-boundary Operator::.).
|
||
|
||
* `\B' represents the match-within-word operator (*note
|
||
Match-within-word Operator::.).
|
||
|
||
* `\<' represents the match-beginning-of-word operator
|
||
(*note Match-beginning-of-word Operator::.).
|
||
|
||
* `\>' represents the match-end-of-word operator (*note
|
||
Match-end-of-word Operator::.).
|
||
|
||
* `\w' represents the match-word-constituent operator (*note
|
||
Match-word-constituent Operator::.).
|
||
|
||
* `\W' represents the match-non-word-constituent operator
|
||
(*note Match-non-word-constituent Operator::.).
|
||
|
||
* `\`' represents the match-beginning-of-buffer operator and
|
||
`\'' represents the match-end-of-buffer operator (*note
|
||
Buffer Operators::.).
|
||
|
||
* If Regex was compiled with the C preprocessor symbol `emacs'
|
||
defined, then `\sCLASS' represents the match-syntactic-class
|
||
operator and `\SCLASS' represents the
|
||
match-not-syntactic-class operator (*note Syntactic Class
|
||
Operators::.).
|
||
|
||
4. In all other cases, Regex ignores `\'. For example, `\n' matches
|
||
`n'.
|
||
|
||
|
||
---------- Footnotes ----------
|
||
|
||
(1) Sometimes you don't have to explicitly quote special characters
|
||
to make them ordinary. For instance, most characters lose any special
|
||
meaning inside a list (*note List Operators::.). In addition, if the
|
||
syntax bits `RE_CONTEXT_INVALID_OPS' and `RE_CONTEXT_INDEP_OPS' aren't
|
||
set, then (for historical reasons) the matcher considers special
|
||
characters ordinary if they are in contexts where the operations they
|
||
represent make no sense; for example, then the match-zero-or-more
|
||
operator (represented by `*') matches itself in the regular expression
|
||
`*foo' because there is no preceding expression on which it can
|
||
operate. It is poor practice, however, to depend on this behavior; if
|
||
you want a special character to be ordinary outside a list, it's better
|
||
to always quote it, regardless.
|
||
|
||
|
||
File: regex.info, Node: Common Operators, Next: GNU Operators, Prev: Regular Expression Syntax, Up: Top
|
||
|
||
Common Operators
|
||
****************
|
||
|
||
You compose regular expressions from operators. In the following
|
||
sections, we describe the regular expression operators specified by
|
||
POSIX; GNU also uses these. Most operators have more than one
|
||
representation as characters. *Note Regular Expression Syntax::, for
|
||
what characters represent what operators under what circumstances.
|
||
|
||
For most operators that can be represented in two ways, one
|
||
representation is a single character and the other is that character
|
||
preceded by `\'. For example, either `(' or `\(' represents the
|
||
open-group operator. Which one does depends on the setting of a syntax
|
||
bit, in this case `RE_NO_BK_PARENS'. Why is this so? Historical
|
||
reasons dictate some of the varying representations, while POSIX
|
||
dictates others.
|
||
|
||
Finally, almost all characters lose any special meaning inside a list
|
||
(*note List Operators::.).
|
||
|
||
* Menu:
|
||
|
||
* Match-self Operator:: Ordinary characters.
|
||
* Match-any-character Operator:: .
|
||
* Concatenation Operator:: Juxtaposition.
|
||
* Repetition Operators:: * + ? {}
|
||
* Alternation Operator:: |
|
||
* List Operators:: [...] [^...]
|
||
* Grouping Operators:: (...)
|
||
* Back-reference Operator:: \digit
|
||
* Anchoring Operators:: ^ $
|
||
|
||
|
||
File: regex.info, Node: Match-self Operator, Next: Match-any-character Operator, Up: Common Operators
|
||
|
||
The Match-self Operator (ORDINARY CHARACTER)
|
||
============================================
|
||
|
||
This operator matches the character itself. All ordinary characters
|
||
(*note Regular Expression Syntax::.) represent this operator. For
|
||
example, `f' is always an ordinary character, so the regular expression
|
||
`f' matches only the string `f'. In particular, it does *not* match
|
||
the string `ff'.
|
||
|
||
|
||
File: regex.info, Node: Match-any-character Operator, Next: Concatenation Operator, Prev: Match-self Operator, Up: Common Operators
|
||
|
||
The Match-any-character Operator (`.')
|
||
======================================
|
||
|
||
This operator matches any single printing or nonprinting character
|
||
except it won't match a:
|
||
|
||
newline
|
||
if the syntax bit `RE_DOT_NEWLINE' isn't set.
|
||
|
||
null
|
||
if the syntax bit `RE_DOT_NOT_NULL' is set.
|
||
|
||
The `.' (period) character represents this operator. For example,
|
||
`a.b' matches any three-character string beginning with `a' and ending
|
||
with `b'.
|
||
|
||
|
||
File: regex.info, Node: Concatenation Operator, Next: Repetition Operators, Prev: Match-any-character Operator, Up: Common Operators
|
||
|
||
The Concatenation Operator
|
||
==========================
|
||
|
||
This operator concatenates two regular expressions A and B. No
|
||
character represents this operator; you simply put B after A. The
|
||
result is a regular expression that will match a string if A matches
|
||
its first part and B matches the rest. For example, `xy' (two
|
||
match-self operators) matches `xy'.
|
||
|
||
|
||
File: regex.info, Node: Repetition Operators, Next: Alternation Operator, Prev: Concatenation Operator, Up: Common Operators
|
||
|
||
Repetition Operators
|
||
====================
|
||
|
||
Repetition operators repeat the preceding regular expression a
|
||
specified number of times.
|
||
|
||
* Menu:
|
||
|
||
* Match-zero-or-more Operator:: *
|
||
* Match-one-or-more Operator:: +
|
||
* Match-zero-or-one Operator:: ?
|
||
* Interval Operators:: {}
|
||
|
||
|
||
File: regex.info, Node: Match-zero-or-more Operator, Next: Match-one-or-more Operator, Up: Repetition Operators
|
||
|
||
The Match-zero-or-more Operator (`*')
|
||
-------------------------------------
|
||
|
||
This operator repeats the smallest possible preceding regular
|
||
expression as many times as necessary (including zero) to match the
|
||
pattern. `*' represents this operator. For example, `o*' matches any
|
||
string made up of zero or more `o's. Since this operator operates on
|
||
the smallest preceding regular expression, `fo*' has a repeating `o',
|
||
not a repeating `fo'. So, `fo*' matches `f', `fo', `foo', and so on.
|
||
|
||
Since the match-zero-or-more operator is a suffix operator, it may be
|
||
useless as such when no regular expression precedes it. This is the
|
||
case when it:
|
||
|
||
* is first in a regular expression, or
|
||
|
||
* follows a match-beginning-of-line, open-group, or alternation
|
||
operator.
|
||
|
||
Three different things can happen in these cases:
|
||
|
||
1. If the syntax bit `RE_CONTEXT_INVALID_OPS' is set, then the
|
||
regular expression is invalid.
|
||
|
||
2. If `RE_CONTEXT_INVALID_OPS' isn't set, but `RE_CONTEXT_INDEP_OPS'
|
||
is, then `*' represents the match-zero-or-more operator (which
|
||
then operates on the empty string).
|
||
|
||
3. Otherwise, `*' is ordinary.
|
||
|
||
|
||
The matcher processes a match-zero-or-more operator by first matching
|
||
as many repetitions of the smallest preceding regular expression as it
|
||
can. Then it continues to match the rest of the pattern.
|
||
|
||
If it can't match the rest of the pattern, it backtracks (as many
|
||
times as necessary), each time discarding one of the matches until it
|
||
can either match the entire pattern or be certain that it cannot get a
|
||
match. For example, when matching `ca*ar' against `caaar', the matcher
|
||
first matches all three `a's of the string with the `a*' of the regular
|
||
expression. However, it cannot then match the final `ar' of the
|
||
regular expression against the final `r' of the string. So it
|
||
backtracks, discarding the match of the last `a' in the string. It can
|
||
then match the remaining `ar'.
|
||
|
||
|
||
File: regex.info, Node: Match-one-or-more Operator, Next: Match-zero-or-one Operator, Prev: Match-zero-or-more Operator, Up: Repetition Operators
|
||
|
||
The Match-one-or-more Operator (`+' or `\+')
|
||
--------------------------------------------
|
||
|
||
If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
|
||
recognize this operator. Otherwise, if the syntax bit `RE_BK_PLUS_QM'
|
||
isn't set, then `+' represents this operator; if it is, then `\+' does.
|
||
|
||
This operator is similar to the match-zero-or-more operator except
|
||
that it repeats the preceding regular expression at least once; *note
|
||
Match-zero-or-more Operator::., for what it operates on, how some
|
||
syntax bits affect it, and how Regex backtracks to match it.
|
||
|
||
For example, supposing that `+' represents the match-one-or-more
|
||
operator; then `ca+r' matches, e.g., `car' and `caaaar', but not `cr'.
|
||
|
||
|
||
File: regex.info, Node: Match-zero-or-one Operator, Next: Interval Operators, Prev: Match-one-or-more Operator, Up: Repetition Operators
|
||
|
||
The Match-zero-or-one Operator (`?' or `\?')
|
||
--------------------------------------------
|
||
|
||
If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
|
||
recognize this operator. Otherwise, if the syntax bit `RE_BK_PLUS_QM'
|
||
isn't set, then `?' represents this operator; if it is, then `\?' does.
|
||
|
||
This operator is similar to the match-zero-or-more operator except
|
||
that it repeats the preceding regular expression once or not at all;
|
||
*note Match-zero-or-more Operator::., to see what it operates on, how
|
||
some syntax bits affect it, and how Regex backtracks to match it.
|
||
|
||
For example, supposing that `?' represents the match-zero-or-one
|
||
operator; then `ca?r' matches both `car' and `cr', but nothing else.
|
||
|
||
|
||
File: regex.info, Node: Interval Operators, Prev: Match-zero-or-one Operator, Up: Repetition Operators
|
||
|
||
Interval Operators (`{' ... `}' or `\{' ... `\}')
|
||
-------------------------------------------------
|
||
|
||
If the syntax bit `RE_INTERVALS' is set, then Regex recognizes
|
||
"interval expressions". They repeat the smallest possible preceding
|
||
regular expression a specified number of times.
|
||
|
||
If the syntax bit `RE_NO_BK_BRACES' is set, `{' represents the
|
||
"open-interval operator" and `}' represents the "close-interval
|
||
operator" ; otherwise, `\{' and `\}' do.
|
||
|
||
Specifically, supposing that `{' and `}' represent the open-interval
|
||
and close-interval operators; then:
|
||
|
||
`{COUNT}'
|
||
matches exactly COUNT occurrences of the preceding regular
|
||
expression.
|
||
|
||
`{MIN,}'
|
||
matches MIN or more occurrences of the preceding regular
|
||
expression.
|
||
|
||
`{MIN, MAX}'
|
||
matches at least MIN but no more than MAX occurrences of the
|
||
preceding regular expression.
|
||
|
||
The interval expression (but not necessarily the regular expression
|
||
that contains it) is invalid if:
|
||
|
||
* MIN is greater than MAX, or
|
||
|
||
* any of COUNT, MIN, or MAX are outside the range zero to
|
||
`RE_DUP_MAX' (which symbol `regex.h' defines).
|
||
|
||
If the interval expression is invalid and the syntax bit
|
||
`RE_NO_BK_BRACES' is set, then Regex considers all the characters in
|
||
the would-be interval to be ordinary. If that bit isn't set, then the
|
||
regular expression is invalid.
|
||
|
||
If the interval expression is valid but there is no preceding regular
|
||
expression on which to operate, then if the syntax bit
|
||
`RE_CONTEXT_INVALID_OPS' is set, the regular expression is invalid. If
|
||
that bit isn't set, then Regex considers all the characters--other than
|
||
backslashes, which it ignores--in the would-be interval to be ordinary.
|
||
|
||
|
||
File: regex.info, Node: Alternation Operator, Next: List Operators, Prev: Repetition Operators, Up: Common Operators
|
||
|
||
The Alternation Operator (`|' or `\|')
|
||
======================================
|
||
|
||
If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't
|
||
recognize this operator. Otherwise, if the syntax bit `RE_NO_BK_VBAR'
|
||
is set, then `|' represents this operator; otherwise, `\|' does.
|
||
|
||
Alternatives match one of a choice of regular expressions: if you put
|
||
the character(s) representing the alternation operator between any two
|
||
regular expressions A and B, the result matches the union of the
|
||
strings that A and B match. For example, supposing that `|' is the
|
||
alternation operator, then `foo|bar|quux' would match any of `foo',
|
||
`bar' or `quux'.
|
||
|
||
The alternation operator operates on the *largest* possible
|
||
surrounding regular expressions. (Put another way, it has the lowest
|
||
precedence of any regular expression operator.) Thus, the only way you
|
||
can delimit its arguments is to use grouping. For example, if `(' and
|
||
`)' are the open and close-group operators, then `fo(o|b)ar' would
|
||
match either `fooar' or `fobar'. (`foo|bar' would match `foo' or
|
||
`bar'.)
|
||
|
||
The matcher usually tries all combinations of alternatives so as to
|
||
match the longest possible string. For example, when matching
|
||
`(fooq|foo)*(qbarquux|bar)' against `fooqbarquux', it cannot take, say,
|
||
the first ("depth-first") combination it could match, since then it
|
||
would be content to match just `fooqbar'.
|
||
|
||
|
||
File: regex.info, Node: List Operators, Next: Grouping Operators, Prev: Alternation Operator, Up: Common Operators
|
||
|
||
List Operators (`[' ... `]' and `[^' ... `]')
|
||
=============================================
|
||
|
||
"Lists", also called "bracket expressions", are a set of one or more
|
||
items. An "item" is a character, a character class expression, or a
|
||
range expression. The syntax bits affect which kinds of items you can
|
||
put in a list. We explain the last two items in subsections below.
|
||
Empty lists are invalid.
|
||
|
||
A "matching list" matches a single character represented by one of
|
||
the list items. You form a matching list by enclosing one or more items
|
||
within an "open-matching-list operator" (represented by `[') and a
|
||
"close-list operator" (represented by `]').
|
||
|
||
For example, `[ab]' matches either `a' or `b'. `[ad]*' matches the
|
||
empty string and any string composed of just `a's and `d's in any
|
||
order. Regex considers invalid a regular expression with a `[' but no
|
||
matching `]'.
|
||
|
||
"Nonmatching lists" are similar to matching lists except that they
|
||
match a single character *not* represented by one of the list items.
|
||
You use an "open-nonmatching-list operator" (represented by `[^'(1))
|
||
instead of an open-matching-list operator to start a nonmatching list.
|
||
|
||
For example, `[^ab]' matches any character except `a' or `b'.
|
||
|
||
If the `posix_newline' field in the pattern buffer (*note GNU Pattern
|
||
Buffers::. is set, then nonmatching lists do not match a newline.
|
||
|
||
Most characters lose any special meaning inside a list. The special
|
||
characters inside a list follow.
|
||
|
||
`]'
|
||
ends the list if it's not the first list item. So, if you want to
|
||
make the `]' character a list item, you must put it first.
|
||
|
||
`\'
|
||
quotes the next character if the syntax bit
|
||
`RE_BACKSLASH_ESCAPE_IN_LISTS' is set.
|
||
|
||
`[:'
|
||
represents the open-character-class operator (*note Character
|
||
Class Operators::.) if the syntax bit `RE_CHAR_CLASSES' is set and
|
||
what follows is a valid character class expression.
|
||
|
||
`:]'
|
||
represents the close-character-class operator if the syntax bit
|
||
`RE_CHAR_CLASSES' is set and what precedes it is an
|
||
open-character-class operator followed by a valid character class
|
||
name.
|
||
|
||
`-'
|
||
represents the range operator (*note Range Operator::.) if it's
|
||
not first or last in a list or the ending point of a range.
|
||
|
||
All other characters are ordinary. For example, `[.*]' matches `.' and
|
||
`*'.
|
||
|
||
* Menu:
|
||
|
||
* Character Class Operators:: [:class:]
|
||
* Range Operator:: start-end
|
||
|
||
---------- Footnotes ----------
|
||
|
||
(1) Regex therefore doesn't consider the `^' to be the first
|
||
character in the list. If you put a `^' character first in (what you
|
||
think is) a matching list, you'll turn it into a nonmatching list.
|
||
|
||
|
||
File: regex.info, Node: Character Class Operators, Next: Range Operator, Up: List Operators
|
||
|
||
Character Class Operators (`[:' ... `:]')
|
||
-----------------------------------------
|
||
|
||
If the syntax bit `RE_CHARACTER_CLASSES' is set, then Regex
|
||
recognizes character class expressions inside lists. A "character
|
||
class expression" matches one character from a given class. You form a
|
||
character class expression by putting a character class name between an
|
||
"open-character-class operator" (represented by `[:') and a
|
||
"close-character-class operator" (represented by `:]'). The character
|
||
class names and their meanings are:
|
||
|
||
`alnum'
|
||
letters and digits
|
||
|
||
`alpha'
|
||
letters
|
||
|
||
`blank'
|
||
system-dependent; for GNU, a space or tab
|
||
|
||
`cntrl'
|
||
control characters (in the ASCII encoding, code 0177 and codes
|
||
less than 040)
|
||
|
||
`digit'
|
||
digits
|
||
|
||
`graph'
|
||
same as `print' except omits space
|
||
|
||
`lower'
|
||
lowercase letters
|
||
|
||
`print'
|
||
printable characters (in the ASCII encoding, space tilde--codes
|
||
040 through 0176)
|
||
|
||
`punct'
|
||
neither control nor alphanumeric characters
|
||
|
||
`space'
|
||
space, carriage return, newline, vertical tab, and form feed
|
||
|
||
`upper'
|
||
uppercase letters
|
||
|
||
`xdigit'
|
||
hexadecimal digits: `0'-`9', `a'-`f', `A'-`F'
|
||
|
||
These correspond to the definitions in the C library's `<ctype.h>'
|
||
facility. For example, `[:alpha:]' corresponds to the standard
|
||
facility `isalpha'. Regex recognizes character class expressions only
|
||
inside of lists; so `[[:alpha:]]' matches any letter, but `[:alpha:]'
|
||
outside of a bracket expression and not followed by a repetition
|
||
operator matches just itself.
|
||
|
||
|
||
File: regex.info, Node: Range Operator, Prev: Character Class Operators, Up: List Operators
|
||
|
||
The Range Operator (`-')
|
||
------------------------
|
||
|
||
Regex recognizes "range expressions" inside a list. They represent
|
||
those characters that fall between two elements in the current
|
||
collating sequence. You form a range expression by putting a "range
|
||
operator" between two characters.(1) `-' represents the range operator.
|
||
For example, `a-f' within a list represents all the characters from `a'
|
||
through `f' inclusively.
|
||
|
||
If the syntax bit `RE_NO_EMPTY_RANGES' is set, then if the range's
|
||
ending point collates less than its starting point, the range (and the
|
||
regular expression containing it) is invalid. For example, the regular
|
||
expression `[z-a]' would be invalid. If this bit isn't set, then Regex
|
||
considers such a range to be empty.
|
||
|
||
Since `-' represents the range operator, if you want to make a `-'
|
||
character itself a list item, you must do one of the following:
|
||
|
||
* Put the `-' either first or last in the list.
|
||
|
||
* Include a range whose starting point collates strictly lower than
|
||
`-' and whose ending point collates equal or higher. Unless a
|
||
range is the first item in a list, a `-' can't be its starting
|
||
point, but *can* be its ending point. That is because Regex
|
||
considers `-' to be the range operator unless it is preceded by
|
||
another `-'. For example, in the ASCII encoding, `)', `*', `+',
|
||
`,', `-', `.', and `/' are contiguous characters in the collating
|
||
sequence. You might think that `[)-+--/]' has two ranges: `)-+'
|
||
and `--/'. Rather, it has the ranges `)-+' and `+--', plus the
|
||
character `/', so it matches, e.g., `,', not `.'.
|
||
|
||
* Put a range whose starting point is `-' first in the list.
|
||
|
||
For example, `[-a-z]' matches a lowercase letter or a hyphen (in
|
||
English, in ASCII).
|
||
|
||
---------- Footnotes ----------
|
||
|
||
(1) You can't use a character class for the starting or ending point
|
||
of a range, since a character class is not a single character.
|
||
|
||
|
||
File: regex.info, Node: Grouping Operators, Next: Back-reference Operator, Prev: List Operators, Up: Common Operators
|
||
|
||
Grouping Operators (`(' ... `)' or `\(' ... `\)')
|
||
=================================================
|
||
|
||
A "group", also known as a "subexpression", consists of an
|
||
"open-group operator", any number of other operators, and a
|
||
"close-group operator". Regex treats this sequence as a unit, just as
|
||
mathematics and programming languages treat a parenthesized expression
|
||
as a unit.
|
||
|
||
Therefore, using "groups", you can:
|
||
|
||
* delimit the argument(s) to an alternation operator (*note
|
||
Alternation Operator::.) or a repetition operator (*note
|
||
Repetition Operators::.).
|
||
|
||
* keep track of the indices of the substring that matched a given
|
||
group. *Note Using Registers::, for a precise explanation. This
|
||
lets you:
|
||
|
||
* use the back-reference operator (*note Back-reference
|
||
Operator::.).
|
||
|
||
* use registers (*note Using Registers::.).
|
||
|
||
If the syntax bit `RE_NO_BK_PARENS' is set, then `(' represents the
|
||
open-group operator and `)' represents the close-group operator;
|
||
otherwise, `\(' and `\)' do.
|
||
|
||
If the syntax bit `RE_UNMATCHED_RIGHT_PAREN_ORD' is set and a
|
||
close-group operator has no matching open-group operator, then Regex
|
||
considers it to match `)'.
|
||
|
||
|
||
File: regex.info, Node: Back-reference Operator, Next: Anchoring Operators, Prev: Grouping Operators, Up: Common Operators
|
||
|
||
The Back-reference Operator ("\"DIGIT)
|
||
======================================
|
||
|
||
If the syntax bit `RE_NO_BK_REF' isn't set, then Regex recognizes
|
||
back references. A back reference matches a specified preceding group.
|
||
The back reference operator is represented by `\DIGIT' anywhere after
|
||
the end of a regular expression's DIGIT-th group (*note Grouping
|
||
Operators::.).
|
||
|
||
DIGIT must be between `1' and `9'. The matcher assigns numbers 1
|
||
through 9 to the first nine groups it encounters. By using one of `\1'
|
||
through `\9' after the corresponding group's close-group operator, you
|
||
can match a substring identical to the one that the group does.
|
||
|
||
Back references match according to the following (in all examples
|
||
below, `(' represents the open-group, `)' the close-group, `{' the
|
||
open-interval and `}' the close-interval operator):
|
||
|
||
* If the group matches a substring, the back reference matches an
|
||
identical substring. For example, `(a)\1' matches `aa' and
|
||
`(bana)na\1bo\1' matches `bananabanabobana'. Likewise, `(.*)\1'
|
||
matches any (newline-free if the syntax bit `RE_DOT_NEWLINE' isn't
|
||
set) string that is composed of two identical halves; the `(.*)'
|
||
matches the first half and the `\1' matches the second half.
|
||
|
||
* If the group matches more than once (as it might if followed by,
|
||
e.g., a repetition operator), then the back reference matches the
|
||
substring the group *last* matched. For example, `((a*)b)*\1\2'
|
||
matches `aabababa'; first group 1 (the outer one) matches `aab'
|
||
and group 2 (the inner one) matches `aa'. Then group 1 matches
|
||
`ab' and group 2 matches `a'. So, `\1' matches `ab' and `\2'
|
||
matches `a'.
|
||
|
||
* If the group doesn't participate in a match, i.e., it is part of an
|
||
alternative not taken or a repetition operator allows zero
|
||
repetitions of it, then the back reference makes the whole match
|
||
fail. For example, `(one()|two())-and-(three\2|four\3)' matches
|
||
`one-and-three' and `two-and-four', but not `one-and-four' or
|
||
`two-and-three'. For example, if the pattern matches `one-and-',
|
||
then its group 2 matches the empty string and its group 3 doesn't
|
||
participate in the match. So, if it then matches `four', then
|
||
when it tries to back reference group 3--which it will attempt to
|
||
do because `\3' follows the `four'--the match will fail because
|
||
group 3 didn't participate in the match.
|
||
|
||
You can use a back reference as an argument to a repetition operator.
|
||
For example, `(a(b))\2*' matches `a' followed by two or more `b's.
|
||
Similarly, `(a(b))\2{3}' matches `abbbb'.
|
||
|
||
If there is no preceding DIGIT-th subexpression, the regular
|
||
expression is invalid.
|
||
|
||
|
||
File: regex.info, Node: Anchoring Operators, Prev: Back-reference Operator, Up: Common Operators
|
||
|
||
Anchoring Operators
|
||
===================
|
||
|
||
These operators can constrain a pattern to match only at the
|
||
beginning or end of the entire string or at the beginning or end of a
|
||
line.
|
||
|
||
* Menu:
|
||
|
||
* Match-beginning-of-line Operator:: ^
|
||
* Match-end-of-line Operator:: $
|
||
|
||
|
||
File: regex.info, Node: Match-beginning-of-line Operator, Next: Match-end-of-line Operator, Up: Anchoring Operators
|
||
|
||
The Match-beginning-of-line Operator (`^')
|
||
------------------------------------------
|
||
|
||
This operator can match the empty string either at the beginning of
|
||
the string or after a newline character. Thus, it is said to "anchor"
|
||
the pattern to the beginning of a line.
|
||
|
||
In the cases following, `^' represents this operator. (Otherwise,
|
||
`^' is ordinary.)
|
||
|
||
* It (the `^') is first in the pattern, as in `^foo'.
|
||
|
||
* The syntax bit `RE_CONTEXT_INDEP_ANCHORS' is set, and it is outside
|
||
a bracket expression.
|
||
|
||
* It follows an open-group or alternation operator, as in `a\(^b\)'
|
||
and `a\|^b'. *Note Grouping Operators::, and *Note Alternation
|
||
Operator::.
|
||
|
||
These rules imply that some valid patterns containing `^' cannot be
|
||
matched; for example, `foo^bar' if `RE_CONTEXT_INDEP_ANCHORS' is set.
|
||
|
||
If the `not_bol' field is set in the pattern buffer (*note GNU
|
||
Pattern Buffers::.), then `^' fails to match at the beginning of the
|
||
string. *Note POSIX Matching::, for when you might find this useful.
|
||
|
||
If the `newline_anchor' field is set in the pattern buffer, then `^'
|
||
fails to match after a newline. This is useful when you do not regard
|
||
the string to be matched as broken into lines.
|
||
|
||
|
||
File: regex.info, Node: Match-end-of-line Operator, Prev: Match-beginning-of-line Operator, Up: Anchoring Operators
|
||
|
||
The Match-end-of-line Operator (`$')
|
||
------------------------------------
|
||
|
||
This operator can match the empty string either at the end of the
|
||
string or before a newline character in the string. Thus, it is said
|
||
to "anchor" the pattern to the end of a line.
|
||
|
||
It is always represented by `$'. For example, `foo$' usually
|
||
matches, e.g., `foo' and, e.g., the first three characters of
|
||
`foo\nbar'.
|
||
|
||
Its interaction with the syntax bits and pattern buffer fields is
|
||
exactly the dual of `^''s; see the previous section. (That is,
|
||
"beginning" becomes "end", "next" becomes "previous", and "after"
|
||
becomes "before".)
|
||
|
||
|
||
File: regex.info, Node: GNU Operators, Next: GNU Emacs Operators, Prev: Common Operators, Up: Top
|
||
|
||
GNU Operators
|
||
*************
|
||
|
||
Following are operators that GNU defines (and POSIX doesn't).
|
||
|
||
* Menu:
|
||
|
||
* Word Operators::
|
||
* Buffer Operators::
|
||
|
||
|
||
File: regex.info, Node: Word Operators, Next: Buffer Operators, Up: GNU Operators
|
||
|
||
Word Operators
|
||
==============
|
||
|
||
The operators in this section require Regex to recognize parts of
|
||
words. Regex uses a syntax table to determine whether or not a
|
||
character is part of a word, i.e., whether or not it is
|
||
"word-constituent".
|
||
|
||
* Menu:
|
||
|
||
* Non-Emacs Syntax Tables::
|
||
* Match-word-boundary Operator:: \b
|
||
* Match-within-word Operator:: \B
|
||
* Match-beginning-of-word Operator:: \<
|
||
* Match-end-of-word Operator:: \>
|
||
* Match-word-constituent Operator:: \w
|
||
* Match-non-word-constituent Operator:: \W
|
||
|
||
|
||
File: regex.info, Node: Non-Emacs Syntax Tables, Next: Match-word-boundary Operator, Up: Word Operators
|
||
|
||
Non-Emacs Syntax Tables
|
||
-----------------------
|
||
|
||
A "syntax table" is an array indexed by the characters in your
|
||
character set. In the ASCII encoding, therefore, a syntax table has
|
||
256 elements. Regex always uses a `char *' variable `re_syntax_table'
|
||
as its syntax table. In some cases, it initializes this variable and
|
||
in others it expects you to initialize it.
|
||
|
||
* If Regex is compiled with the preprocessor symbols `emacs' and
|
||
`SYNTAX_TABLE' both undefined, then Regex allocates
|
||
`re_syntax_table' and initializes an element I either to `Sword'
|
||
(which it defines) if I is a letter, number, or `_', or to zero if
|
||
it's not.
|
||
|
||
* If Regex is compiled with `emacs' undefined but `SYNTAX_TABLE'
|
||
defined, then Regex expects you to define a `char *' variable
|
||
`re_syntax_table' to be a valid syntax table.
|
||
|
||
* *Note Emacs Syntax Tables::, for what happens when Regex is
|
||
compiled with the preprocessor symbol `emacs' defined.
|
||
|
||
|
||
File: regex.info, Node: Match-word-boundary Operator, Next: Match-within-word Operator, Prev: Non-Emacs Syntax Tables, Up: Word Operators
|
||
|
||
The Match-word-boundary Operator (`\b')
|
||
---------------------------------------
|
||
|
||
This operator (represented by `\b') matches the empty string at
|
||
either the beginning or the end of a word. For example, `\brat\b'
|
||
matches the separate word `rat'.
|
||
|
||
|
||
File: regex.info, Node: Match-within-word Operator, Next: Match-beginning-of-word Operator, Prev: Match-word-boundary Operator, Up: Word Operators
|
||
|
||
The Match-within-word Operator (`\B')
|
||
-------------------------------------
|
||
|
||
This operator (represented by `\B') matches the empty string within a
|
||
word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat'
|
||
doesn't match `dirty rat'.
|
||
|
||
|
||
File: regex.info, Node: Match-beginning-of-word Operator, Next: Match-end-of-word Operator, Prev: Match-within-word Operator, Up: Word Operators
|
||
|
||
The Match-beginning-of-word Operator (`\<')
|
||
-------------------------------------------
|
||
|
||
This operator (represented by `\<') matches the empty string at the
|
||
beginning of a word.
|
||
|
||
|
||
File: regex.info, Node: Match-end-of-word Operator, Next: Match-word-constituent Operator, Prev: Match-beginning-of-word Operator, Up: Word Operators
|
||
|
||
The Match-end-of-word Operator (`\>')
|
||
-------------------------------------
|
||
|
||
This operator (represented by `\>') matches the empty string at the
|
||
end of a word.
|
||
|
||
|
||
File: regex.info, Node: Match-word-constituent Operator, Next: Match-non-word-constituent Operator, Prev: Match-end-of-word Operator, Up: Word Operators
|
||
|
||
The Match-word-constituent Operator (`\w')
|
||
------------------------------------------
|
||
|
||
This operator (represented by `\w') matches any word-constituent
|
||
character.
|
||
|
||
|
||
File: regex.info, Node: Match-non-word-constituent Operator, Prev: Match-word-constituent Operator, Up: Word Operators
|
||
|
||
The Match-non-word-constituent Operator (`\W')
|
||
----------------------------------------------
|
||
|
||
This operator (represented by `\W') matches any character that is not
|
||
word-constituent.
|
||
|
||
|
||
File: regex.info, Node: Buffer Operators, Prev: Word Operators, Up: GNU Operators
|
||
|
||
Buffer Operators
|
||
================
|
||
|
||
Following are operators which work on buffers. In Emacs, a "buffer"
|
||
is, naturally, an Emacs buffer. For other programs, Regex considers the
|
||
entire string to be matched as the buffer.
|
||
|
||
* Menu:
|
||
|
||
* Match-beginning-of-buffer Operator:: \`
|
||
* Match-end-of-buffer Operator:: \'
|
||
|
||
|
||
File: regex.info, Node: Match-beginning-of-buffer Operator, Next: Match-end-of-buffer Operator, Up: Buffer Operators
|
||
|
||
The Match-beginning-of-buffer Operator (`\`')
|
||
---------------------------------------------
|
||
|
||
This operator (represented by `\`') matches the empty string at the
|
||
beginning of the buffer.
|
||
|
||
|
||
File: regex.info, Node: Match-end-of-buffer Operator, Prev: Match-beginning-of-buffer Operator, Up: Buffer Operators
|
||
|
||
The Match-end-of-buffer Operator (`\'')
|
||
---------------------------------------
|
||
|
||
This operator (represented by `\'') matches the empty string at the
|
||
end of the buffer.
|
||
|
||
|
||
File: regex.info, Node: GNU Emacs Operators, Next: What Gets Matched?, Prev: GNU Operators, Up: Top
|
||
|
||
GNU Emacs Operators
|
||
*******************
|
||
|
||
Following are operators that GNU defines (and POSIX doesn't) that you
|
||
can use only when Regex is compiled with the preprocessor symbol
|
||
`emacs' defined.
|
||
|
||
* Menu:
|
||
|
||
* Syntactic Class Operators::
|
||
|
||
|
||
File: regex.info, Node: Syntactic Class Operators, Up: GNU Emacs Operators
|
||
|
||
Syntactic Class Operators
|
||
=========================
|
||
|
||
The operators in this section require Regex to recognize the syntactic
|
||
classes of characters. Regex uses a syntax table to determine this.
|
||
|
||
* Menu:
|
||
|
||
* Emacs Syntax Tables::
|
||
* Match-syntactic-class Operator:: \sCLASS
|
||
* Match-not-syntactic-class Operator:: \SCLASS
|
||
|
||
|
||
File: regex.info, Node: Emacs Syntax Tables, Next: Match-syntactic-class Operator, Up: Syntactic Class Operators
|
||
|
||
Emacs Syntax Tables
|
||
-------------------
|
||
|
||
A "syntax table" is an array indexed by the characters in your
|
||
character set. In the ASCII encoding, therefore, a syntax table has
|
||
256 elements.
|
||
|
||
If Regex is compiled with the preprocessor symbol `emacs' defined,
|
||
then Regex expects you to define and initialize the variable
|
||
`re_syntax_table' to be an Emacs syntax table. Emacs' syntax tables
|
||
are more complicated than Regex's own (*note Non-Emacs Syntax
|
||
Tables::.). *Note Syntax: (emacs)Syntax, for a description of Emacs'
|
||
syntax tables.
|
||
|
||
|
||
File: regex.info, Node: Match-syntactic-class Operator, Next: Match-not-syntactic-class Operator, Prev: Emacs Syntax Tables, Up: Syntactic Class Operators
|
||
|
||
The Match-syntactic-class Operator (`\s'CLASS)
|
||
----------------------------------------------
|
||
|
||
This operator matches any character whose syntactic class is
|
||
represented by a specified character. `\sCLASS' represents this
|
||
operator where CLASS is the character representing the syntactic class
|
||
you want. For example, `w' represents the syntactic class of
|
||
word-constituent characters, so `\sw' matches any word-constituent
|
||
character.
|
||
|
||
|
||
File: regex.info, Node: Match-not-syntactic-class Operator, Prev: Match-syntactic-class Operator, Up: Syntactic Class Operators
|
||
|
||
The Match-not-syntactic-class Operator (`\S'CLASS)
|
||
--------------------------------------------------
|
||
|
||
This operator is similar to the match-syntactic-class operator except
|
||
that it matches any character whose syntactic class is *not*
|
||
represented by the specified character. `\SCLASS' represents this
|
||
operator. For example, `w' represents the syntactic class of
|
||
word-constituent characters, so `\Sw' matches any character that is not
|
||
word-constituent.
|
||
|
||
|
||
File: regex.info, Node: What Gets Matched?, Next: Programming with Regex, Prev: GNU Emacs Operators, Up: Top
|
||
|
||
What Gets Matched?
|
||
******************
|
||
|
||
Regex usually matches strings according to the "leftmost longest"
|
||
rule; that is, it chooses the longest of the leftmost matches. This
|
||
does not mean that for a regular expression containing subexpressions
|
||
that it simply chooses the longest match for each subexpression, left to
|
||
right; the overall match must also be the longest possible one.
|
||
|
||
For example, `(ac*)(c*d[ac]*)\1' matches `acdacaaa', not `acdac', as
|
||
it would if it were to choose the longest match for the first
|
||
subexpression.
|
||
|
||
|
||
File: regex.info, Node: Programming with Regex, Next: Copying, Prev: What Gets Matched?, Up: Top
|
||
|
||
Programming with Regex
|
||
**********************
|
||
|
||
Here we describe how you use the Regex data structures and functions
|
||
in C programs. Regex has three interfaces: one designed for GNU, one
|
||
compatible with POSIX and one compatible with Berkeley UNIX.
|
||
|
||
* Menu:
|
||
|
||
* GNU Regex Functions::
|
||
* POSIX Regex Functions::
|
||
* BSD Regex Functions::
|
||
|
||
|
||
File: regex.info, Node: GNU Regex Functions, Next: POSIX Regex Functions, Up: Programming with Regex
|
||
|
||
GNU Regex Functions
|
||
===================
|
||
|
||
If you're writing code that doesn't need to be compatible with either
|
||
POSIX or Berkeley UNIX, you can use these functions. They provide more
|
||
options than the other interfaces.
|
||
|
||
* Menu:
|
||
|
||
* GNU Pattern Buffers:: The re_pattern_buffer type.
|
||
* GNU Regular Expression Compiling:: re_compile_pattern ()
|
||
* GNU Matching:: re_match ()
|
||
* GNU Searching:: re_search ()
|
||
* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
|
||
* Searching with Fastmaps:: re_compile_fastmap ()
|
||
* GNU Translate Tables:: The `translate' field.
|
||
* Using Registers:: The re_registers type and related fns.
|
||
* Freeing GNU Pattern Buffers:: regfree ()
|
||
|
||
|
||
File: regex.info, Node: GNU Pattern Buffers, Next: GNU Regular Expression Compiling, Up: GNU Regex Functions
|
||
|
||
GNU Pattern Buffers
|
||
-------------------
|
||
|
||
To compile, match, or search for a given regular expression, you must
|
||
supply a pattern buffer. A "pattern buffer" holds one compiled regular
|
||
expression.(1)
|
||
|
||
You can have several different pattern buffers simultaneously, each
|
||
holding a compiled pattern for a different regular expression.
|
||
|
||
`regex.h' defines the pattern buffer `struct' as follows:
|
||
|
||
/* Space that holds the compiled pattern. It is declared as
|
||
`unsigned char *' because its elements are
|
||
sometimes used as array indexes. */
|
||
unsigned char *buffer;
|
||
|
||
/* Number of bytes to which `buffer' points. */
|
||
unsigned long allocated;
|
||
|
||
/* Number of bytes actually used in `buffer'. */
|
||
unsigned long used;
|
||
|
||
/* Syntax setting with which the pattern was compiled. */
|
||
reg_syntax_t syntax;
|
||
|
||
/* Pointer to a fastmap, if any, otherwise zero. re_search uses
|
||
the fastmap, if there is one, to skip over impossible
|
||
starting points for matches. */
|
||
char *fastmap;
|
||
|
||
/* Either a translate table to apply to all characters before
|
||
comparing them, or zero for no translation. The translation
|
||
is applied to a pattern when it is compiled and to a string
|
||
when it is matched. */
|
||
char *translate;
|
||
|
||
/* Number of subexpressions found by the compiler. */
|
||
size_t re_nsub;
|
||
|
||
/* Zero if this pattern cannot match the empty string, one else.
|
||
Well, in truth it's used only in `re_search_2', to see
|
||
whether or not we should use the fastmap, so we don't set
|
||
this absolutely perfectly; see `re_compile_fastmap' (the
|
||
`duplicate' case). */
|
||
unsigned can_be_null : 1;
|
||
|
||
/* If REGS_UNALLOCATED, allocate space in the `regs' structure
|
||
for `max (RE_NREGS, re_nsub + 1)' groups.
|
||
If REGS_REALLOCATE, reallocate space if necessary.
|
||
If REGS_FIXED, use what's there. */
|
||
#define REGS_UNALLOCATED 0
|
||
#define REGS_REALLOCATE 1
|
||
#define REGS_FIXED 2
|
||
unsigned regs_allocated : 2;
|
||
|
||
/* Set to zero when `regex_compile' compiles a pattern; set to one
|
||
by `re_compile_fastmap' if it updates the fastmap. */
|
||
unsigned fastmap_accurate : 1;
|
||
|
||
/* If set, `re_match_2' does not return information about
|
||
subexpressions. */
|
||
unsigned no_sub : 1;
|
||
|
||
/* If set, a beginning-of-line anchor doesn't match at the
|
||
beginning of the string. */
|
||
unsigned not_bol : 1;
|
||
|
||
/* Similarly for an end-of-line anchor. */
|
||
unsigned not_eol : 1;
|
||
|
||
/* If true, an anchor at a newline matches. */
|
||
unsigned newline_anchor : 1;
|
||
|
||
---------- Footnotes ----------
|
||
|
||
(1) Regular expressions are also referred to as "patterns," hence
|
||
the name "pattern buffer."
|
||
|
||
|
||
File: regex.info, Node: GNU Regular Expression Compiling, Next: GNU Matching, Prev: GNU Pattern Buffers, Up: GNU Regex Functions
|
||
|
||
GNU Regular Expression Compiling
|
||
--------------------------------
|
||
|
||
In GNU, you can both match and search for a given regular expression.
|
||
To do either, you must first compile it in a pattern buffer (*note GNU
|
||
Pattern Buffers::.).
|
||
|
||
Regular expressions match according to the syntax with which they were
|
||
compiled; with GNU, you indicate what syntax you want by setting the
|
||
variable `re_syntax_options' (declared in `regex.h' and defined in
|
||
`regex.c') before calling the compiling function, `re_compile_pattern'
|
||
(see below). *Note Syntax Bits::, and *Note Predefined Syntaxes::.
|
||
|
||
You can change the value of `re_syntax_options' at any time.
|
||
Usually, however, you set its value once and then never change it.
|
||
|
||
`re_compile_pattern' takes a pattern buffer as an argument. You must
|
||
initialize the following fields:
|
||
|
||
`translate initialization'
|
||
`translate'
|
||
Initialize this to point to a translate table if you want one, or
|
||
to zero if you don't. We explain translate tables in *Note GNU
|
||
Translate Tables::.
|
||
|
||
`fastmap'
|
||
Initialize this to nonzero if you want a fastmap, or to zero if you
|
||
don't.
|
||
|
||
`buffer'
|
||
`allocated'
|
||
If you want `re_compile_pattern' to allocate memory for the
|
||
compiled pattern, set both of these to zero. If you have an
|
||
existing block of memory (allocated with `malloc') you want Regex
|
||
to use, set `buffer' to its address and `allocated' to its size (in
|
||
bytes).
|
||
|
||
`re_compile_pattern' uses `realloc' to extend the space for the
|
||
compiled pattern as necessary.
|
||
|
||
To compile a pattern buffer, use:
|
||
|
||
char *
|
||
re_compile_pattern (const char *REGEX, const int REGEX_SIZE,
|
||
struct re_pattern_buffer *PATTERN_BUFFER)
|
||
|
||
REGEX is the regular expression's address, REGEX_SIZE is its length,
|
||
and PATTERN_BUFFER is the pattern buffer's address.
|
||
|
||
If `re_compile_pattern' successfully compiles the regular expression,
|
||
it returns zero and sets `*PATTERN_BUFFER' to the compiled pattern. It
|
||
sets the pattern buffer's fields as follows:
|
||
|
||
`buffer'
|
||
to the compiled pattern.
|
||
|
||
`used'
|
||
to the number of bytes the compiled pattern in `buffer' occupies.
|
||
|
||
`syntax'
|
||
to the current value of `re_syntax_options'.
|
||
|
||
`re_nsub'
|
||
to the number of subexpressions in REGEX.
|
||
|
||
`fastmap_accurate'
|
||
to zero on the theory that the pattern you're compiling is
|
||
different than the one previously compiled into `buffer'; in that
|
||
case (since you can't make a fastmap without a compiled pattern),
|
||
`fastmap' would either contain an incompatible fastmap, or nothing
|
||
at all.
|
||
|
||
If `re_compile_pattern' can't compile REGEX, it returns an error
|
||
string corresponding to one of the errors listed in *Note POSIX Regular
|
||
Expression Compiling::.
|
||
|
||
|
||
File: regex.info, Node: GNU Matching, Next: GNU Searching, Prev: GNU Regular Expression Compiling, Up: GNU Regex Functions
|
||
|
||
GNU Matching
|
||
------------
|
||
|
||
Matching the GNU way means trying to match as much of a string as
|
||
possible starting at a position within it you specify. Once you've
|
||
compiled a pattern into a pattern buffer (*note GNU Regular Expression
|
||
Compiling::.), you can ask the matcher to match that pattern against a
|
||
string using:
|
||
|
||
int
|
||
re_match (struct re_pattern_buffer *PATTERN_BUFFER,
|
||
const char *STRING, const int SIZE,
|
||
const int START, struct re_registers *REGS)
|
||
|
||
PATTERN_BUFFER is the address of a pattern buffer containing a compiled
|
||
pattern. STRING is the string you want to match; it can contain
|
||
newline and null characters. SIZE is the length of that string. START
|
||
is the string index at which you want to begin matching; the first
|
||
character of STRING is at index zero. *Note Using Registers::, for a
|
||
explanation of REGS; you can safely pass zero.
|
||
|
||
`re_match' matches the regular expression in PATTERN_BUFFER against
|
||
the string STRING according to the syntax in PATTERN_BUFFERS's `syntax'
|
||
field. (*Note GNU Regular Expression Compiling::, for how to set it.)
|
||
The function returns -1 if the compiled pattern does not match any part
|
||
of STRING and -2 if an internal error happens; otherwise, it returns
|
||
how many (possibly zero) characters of STRING the pattern matched.
|
||
|
||
An example: suppose PATTERN_BUFFER points to a pattern buffer
|
||
containing the compiled pattern for `a*', and STRING points to `aaaaab'
|
||
(whereupon SIZE should be 6). Then if START is 2, `re_match' returns 3,
|
||
i.e., `a*' would have matched the last three `a's in STRING. If START
|
||
is 0, `re_match' returns 5, i.e., `a*' would have matched all the `a's
|
||
in STRING. If START is either 5 or 6, it returns zero.
|
||
|
||
If START is not between zero and SIZE, then `re_match' returns -1.
|
||
|
||
|
||
File: regex.info, Node: GNU Searching, Next: Matching/Searching with Split Data, Prev: GNU Matching, Up: GNU Regex Functions
|
||
|
||
GNU Searching
|
||
-------------
|
||
|
||
"Searching" means trying to match starting at successive positions
|
||
within a string. The function `re_search' does this.
|
||
|
||
Before calling `re_search', you must compile your regular expression.
|
||
*Note GNU Regular Expression Compiling::.
|
||
|
||
Here is the function declaration:
|
||
|
||
int
|
||
re_search (struct re_pattern_buffer *PATTERN_BUFFER,
|
||
const char *STRING, const int SIZE,
|
||
const int START, const int RANGE,
|
||
struct re_registers *REGS)
|
||
|
||
whose arguments are the same as those to `re_match' (*note GNU
|
||
Matching::.) except that the two arguments START and RANGE replace
|
||
`re_match''s argument START.
|
||
|
||
If RANGE is positive, then `re_search' attempts a match starting
|
||
first at index START, then at START + 1 if that fails, and so on, up to
|
||
START + RANGE; if RANGE is negative, then it attempts a match starting
|
||
first at index START, then at START -1 if that fails, and so on.
|
||
|
||
If START is not between zero and SIZE, then `re_search' returns -1.
|
||
When RANGE is positive, `re_search' adjusts RANGE so that START + RANGE
|
||
- 1 is between zero and SIZE, if necessary; that way it won't search
|
||
outside of STRING. Similarly, when RANGE is negative, `re_search'
|
||
adjusts RANGE so that START + RANGE + 1 is between zero and SIZE, if
|
||
necessary.
|
||
|
||
If the `fastmap' field of PATTERN_BUFFER is zero, `re_search' matches
|
||
starting at consecutive positions; otherwise, it uses `fastmap' to make
|
||
the search more efficient. *Note Searching with Fastmaps::.
|
||
|
||
If no match is found, `re_search' returns -1. If a match is found,
|
||
it returns the index where the match began. If an internal error
|
||
happens, it returns -2.
|
||
|
||
|
||
File: regex.info, Node: Matching/Searching with Split Data, Next: Searching with Fastmaps, Prev: GNU Searching, Up: GNU Regex Functions
|
||
|
||
Matching and Searching with Split Data
|
||
--------------------------------------
|
||
|
||
Using the functions `re_match_2' and `re_search_2', you can match or
|
||
search in data that is divided into two strings.
|
||
|
||
The function:
|
||
|
||
int
|
||
re_match_2 (struct re_pattern_buffer *BUFFER,
|
||
const char *STRING1, const int SIZE1,
|
||
const char *STRING2, const int SIZE2,
|
||
const int START,
|
||
struct re_registers *REGS,
|
||
const int STOP)
|
||
|
||
is similar to `re_match' (*note GNU Matching::.) except that you pass
|
||
*two* data strings and sizes, and an index STOP beyond which you don't
|
||
want the matcher to try matching. As with `re_match', if it succeeds,
|
||
`re_match_2' returns how many characters of STRING it matched. Regard
|
||
STRING1 and STRING2 as concatenated when you set the arguments START and
|
||
STOP and use the contents of REGS; `re_match_2' never returns a value
|
||
larger than SIZE1 + SIZE2.
|
||
|
||
The function:
|
||
|
||
int
|
||
re_search_2 (struct re_pattern_buffer *BUFFER,
|
||
const char *STRING1, const int SIZE1,
|
||
const char *STRING2, const int SIZE2,
|
||
const int START, const int RANGE,
|
||
struct re_registers *REGS,
|
||
const int STOP)
|
||
|
||
is similarly related to `re_search'.
|
||
|
||
|
||
File: regex.info, Node: Searching with Fastmaps, Next: GNU Translate Tables, Prev: Matching/Searching with Split Data, Up: GNU Regex Functions
|
||
|
||
Searching with Fastmaps
|
||
-----------------------
|
||
|
||
If you're searching through a long string, you should use a fastmap.
|
||
Without one, the searcher tries to match at consecutive positions in the
|
||
string. Generally, most of the characters in the string could not start
|
||
a match. It takes much longer to try matching at a given position in
|
||
the string than it does to check in a table whether or not the
|
||
character at that position could start a match. A "fastmap" is such a
|
||
table.
|
||
|
||
More specifically, a fastmap is an array indexed by the characters in
|
||
your character set. Under the ASCII encoding, therefore, a fastmap has
|
||
256 elements. If you want the searcher to use a fastmap with a given
|
||
pattern buffer, you must allocate the array and assign the array's
|
||
address to the pattern buffer's `fastmap' field. You either can
|
||
compile the fastmap yourself or have `re_search' do it for you; when
|
||
`fastmap' is nonzero, it automatically compiles a fastmap the first
|
||
time you search using a particular compiled pattern.
|
||
|
||
To compile a fastmap yourself, use:
|
||
|
||
int
|
||
re_compile_fastmap (struct re_pattern_buffer *PATTERN_BUFFER)
|
||
|
||
PATTERN_BUFFER is the address of a pattern buffer. If the character C
|
||
could start a match for the pattern, `re_compile_fastmap' makes
|
||
`PATTERN_BUFFER->fastmap[C]' nonzero. It returns 0 if it can compile a
|
||
fastmap and -2 if there is an internal error. For example, if `|' is
|
||
the alternation operator and PATTERN_BUFFER holds the compiled pattern
|
||
for `a|b', then `re_compile_fastmap' sets `fastmap['a']' and
|
||
`fastmap['b']' (and no others).
|
||
|
||
`re_search' uses a fastmap as it moves along in the string: it checks
|
||
the string's characters until it finds one that's in the fastmap. Then
|
||
it tries matching at that character. If the match fails, it repeats
|
||
the process. So, by using a fastmap, `re_search' doesn't waste time
|
||
trying to match at positions in the string that couldn't start a match.
|
||
|
||
If you don't want `re_search' to use a fastmap, store zero in the
|
||
`fastmap' field of the pattern buffer before calling `re_search'.
|
||
|
||
Once you've initialized a pattern buffer's `fastmap' field, you need
|
||
never do so again--even if you compile a new pattern in it--provided
|
||
the way the field is set still reflects whether or not you want a
|
||
fastmap. `re_search' will still either do nothing if `fastmap' is null
|
||
or, if it isn't, compile a new fastmap for the new pattern.
|
||
|
||
|
||
File: regex.info, Node: GNU Translate Tables, Next: Using Registers, Prev: Searching with Fastmaps, Up: GNU Regex Functions
|
||
|
||
GNU Translate Tables
|
||
--------------------
|
||
|
||
If you set the `translate' field of a pattern buffer to a translate
|
||
table, then the GNU Regex functions to which you've passed that pattern
|
||
buffer use it to apply a simple transformation to all the regular
|
||
expression and string characters at which they look.
|
||
|
||
A "translate table" is an array indexed by the characters in your
|
||
character set. Under the ASCII encoding, therefore, a translate table
|
||
has 256 elements. The array's elements are also characters in your
|
||
character set. When the Regex functions see a character C, they use
|
||
`translate[C]' in its place, with one exception: the character after a
|
||
`\' is not translated. (This ensures that, the operators, e.g., `\B'
|
||
and `\b', are always distinguishable.)
|
||
|
||
For example, a table that maps all lowercase letters to the
|
||
corresponding uppercase ones would cause the matcher to ignore
|
||
differences in case.(1) Such a table would map all characters except
|
||
lowercase letters to themselves, and lowercase letters to the
|
||
corresponding uppercase ones. Under the ASCII encoding, here's how you
|
||
could initialize such a table (we'll call it `case_fold'):
|
||
|
||
for (i = 0; i < 256; i++)
|
||
case_fold[i] = i;
|
||
for (i = 'a'; i <= 'z'; i++)
|
||
case_fold[i] = i - ('a' - 'A');
|
||
|
||
You tell Regex to use a translate table on a given pattern buffer by
|
||
assigning that table's address to the `translate' field of that buffer.
|
||
If you don't want Regex to do any translation, put zero into this
|
||
field. You'll get weird results if you change the table's contents
|
||
anytime between compiling the pattern buffer, compiling its fastmap, and
|
||
matching or searching with the pattern buffer.
|
||
|
||
---------- Footnotes ----------
|
||
|
||
(1) A table that maps all uppercase letters to the corresponding
|
||
lowercase ones would work just as well for this purpose.
|
||
|
||
|
||
File: regex.info, Node: Using Registers, Next: Freeing GNU Pattern Buffers, Prev: GNU Translate Tables, Up: GNU Regex Functions
|
||
|
||
Using Registers
|
||
---------------
|
||
|
||
A group in a regular expression can match a (posssibly empty)
|
||
substring of the string that regular expression as a whole matched.
|
||
The matcher remembers the beginning and end of the substring matched by
|
||
each group.
|
||
|
||
To find out what they matched, pass a nonzero REGS argument to a GNU
|
||
matching or searching function (*note GNU Matching::. and *Note GNU
|
||
Searching::), i.e., the address of a structure of this type, as defined
|
||
in `regex.h':
|
||
|
||
struct re_registers
|
||
{
|
||
unsigned num_regs;
|
||
regoff_t *start;
|
||
regoff_t *end;
|
||
};
|
||
|
||
Except for (possibly) the NUM_REGS'th element (see below), the Ith
|
||
element of the `start' and `end' arrays records information about the
|
||
Ith group in the pattern. (They're declared as C pointers, but this is
|
||
only because not all C compilers accept zero-length arrays;
|
||
conceptually, it is simplest to think of them as arrays.)
|
||
|
||
The `start' and `end' arrays are allocated in various ways, depending
|
||
on the value of the `regs_allocated' field in the pattern buffer passed
|
||
to the matcher.
|
||
|
||
The simplest and perhaps most useful is to let the matcher
|
||
(re)allocate enough space to record information for all the groups in
|
||
the regular expression. If `regs_allocated' is `REGS_UNALLOCATED', the
|
||
matcher allocates 1 + RE_NSUB (another field in the pattern buffer;
|
||
*note GNU Pattern Buffers::.). The extra element is set to -1, and
|
||
sets `regs_allocated' to `REGS_REALLOCATE'. Then on subsequent calls
|
||
with the same pattern buffer and REGS arguments, the matcher
|
||
reallocates more space if necessary.
|
||
|
||
It would perhaps be more logical to make the `regs_allocated' field
|
||
part of the `re_registers' structure, instead of part of the pattern
|
||
buffer. But in that case the caller would be forced to initialize the
|
||
structure before passing it. Much existing code doesn't do this
|
||
initialization, and it's arguably better to avoid it anyway.
|
||
|
||
`re_compile_pattern' sets `regs_allocated' to `REGS_UNALLOCATED', so
|
||
if you use the GNU regular expression functions, you get this behavior
|
||
by default.
|
||
|
||
xx document re_set_registers
|
||
|
||
POSIX, on the other hand, requires a different interface: the caller
|
||
is supposed to pass in a fixed-length array which the matcher fills.
|
||
Therefore, if `regs_allocated' is `REGS_FIXED' the matcher simply fills
|
||
that array.
|
||
|
||
The following examples illustrate the information recorded in the
|
||
`re_registers' structure. (In all of them, `(' represents the
|
||
open-group and `)' the close-group operator. The first character in
|
||
the string STRING is at index 0.)
|
||
|
||
* If the regular expression has an I-th group not contained within
|
||
another group that matches a substring of STRING, then the
|
||
function sets `REGS->start[I]' to the index in STRING where the
|
||
substring matched by the I-th group begins, and `REGS->end[I]' to
|
||
the index just beyond that substring's end. The function sets
|
||
`REGS->start[0]' and `REGS->end[0]' to analogous information about
|
||
the entire pattern.
|
||
|
||
For example, when you match `((a)(b))' against `ab', you get:
|
||
|
||
* 0 in `REGS->start[0]' and 2 in `REGS->end[0]'
|
||
|
||
* 0 in `REGS->start[1]' and 2 in `REGS->end[1]'
|
||
|
||
* 0 in `REGS->start[2]' and 1 in `REGS->end[2]'
|
||
|
||
* 1 in `REGS->start[3]' and 2 in `REGS->end[3]'
|
||
|
||
* If a group matches more than once (as it might if followed by,
|
||
e.g., a repetition operator), then the function reports the
|
||
information about what the group *last* matched.
|
||
|
||
For example, when you match the pattern `(a)*' against the string
|
||
`aa', you get:
|
||
|
||
* 0 in `REGS->start[0]' and 2 in `REGS->end[0]'
|
||
|
||
* 1 in `REGS->start[1]' and 2 in `REGS->end[1]'
|
||
|
||
* If the I-th group does not participate in a successful match,
|
||
e.g., it is an alternative not taken or a repetition operator
|
||
allows zero repetitions of it, then the function sets
|
||
`REGS->start[I]' and `REGS->end[I]' to -1.
|
||
|
||
For example, when you match the pattern `(a)*b' against the string
|
||
`b', you get:
|
||
|
||
* 0 in `REGS->start[0]' and 1 in `REGS->end[0]'
|
||
|
||
* -1 in `REGS->start[1]' and -1 in `REGS->end[1]'
|
||
|
||
* If the I-th group matches a zero-length string, then the function
|
||
sets `REGS->start[I]' and `REGS->end[I]' to the index just beyond
|
||
that zero-length string.
|
||
|
||
For example, when you match the pattern `(a*)b' against the string
|
||
`b', you get:
|
||
|
||
* 0 in `REGS->start[0]' and 1 in `REGS->end[0]'
|
||
|
||
* 0 in `REGS->start[1]' and 0 in `REGS->end[1]'
|
||
|
||
* If an I-th group contains a J-th group in turn not contained
|
||
within any other group within group I and the function reports a
|
||
match of the I-th group, then it records in `REGS->start[J]' and
|
||
`REGS->end[J]' the last match (if it matched) of the J-th group.
|
||
|
||
For example, when you match the pattern `((a*)b)*' against the
|
||
string `abb', group 2 last matches the empty string, so you get
|
||
what it previously matched:
|
||
|
||
* 0 in `REGS->start[0]' and 3 in `REGS->end[0]'
|
||
|
||
* 2 in `REGS->start[1]' and 3 in `REGS->end[1]'
|
||
|
||
* 2 in `REGS->start[2]' and 2 in `REGS->end[2]'
|
||
|
||
When you match the pattern `((a)*b)*' against the string `abb',
|
||
group 2 doesn't participate in the last match, so you get:
|
||
|
||
* 0 in `REGS->start[0]' and 3 in `REGS->end[0]'
|
||
|
||
* 2 in `REGS->start[1]' and 3 in `REGS->end[1]'
|
||
|
||
* 0 in `REGS->start[2]' and 1 in `REGS->end[2]'
|
||
|
||
* If an I-th group contains a J-th group in turn not contained
|
||
within any other group within group I and the function sets
|
||
`REGS->start[I]' and `REGS->end[I]' to -1, then it also sets
|
||
`REGS->start[J]' and `REGS->end[J]' to -1.
|
||
|
||
For example, when you match the pattern `((a)*b)*c' against the
|
||
string `c', you get:
|
||
|
||
* 0 in `REGS->start[0]' and 1 in `REGS->end[0]'
|
||
|
||
* -1 in `REGS->start[1]' and -1 in `REGS->end[1]'
|
||
|
||
* -1 in `REGS->start[2]' and -1 in `REGS->end[2]'
|
||
|
||
|
||
File: regex.info, Node: Freeing GNU Pattern Buffers, Prev: Using Registers, Up: GNU Regex Functions
|
||
|
||
Freeing GNU Pattern Buffers
|
||
---------------------------
|
||
|
||
To free any allocated fields of a pattern buffer, you can use the
|
||
POSIX function described in *Note Freeing POSIX Pattern Buffers::,
|
||
since the type `regex_t'--the type for POSIX pattern buffers--is
|
||
equivalent to the type `re_pattern_buffer'. After freeing a pattern
|
||
buffer, you need to again compile a regular expression in it (*note GNU
|
||
Regular Expression Compiling::.) before passing it to a matching or
|
||
searching function.
|
||
|
||
|
||
File: regex.info, Node: POSIX Regex Functions, Next: BSD Regex Functions, Prev: GNU Regex Functions, Up: Programming with Regex
|
||
|
||
POSIX Regex Functions
|
||
=====================
|
||
|
||
If you're writing code that has to be POSIX compatible, you'll need
|
||
to use these functions. Their interfaces are as specified by POSIX,
|
||
draft 1003.2/D11.2.
|
||
|
||
* Menu:
|
||
|
||
* POSIX Pattern Buffers:: The regex_t type.
|
||
* POSIX Regular Expression Compiling:: regcomp ()
|
||
* POSIX Matching:: regexec ()
|
||
* Reporting Errors:: regerror ()
|
||
* Using Byte Offsets:: The regmatch_t type.
|
||
* Freeing POSIX Pattern Buffers:: regfree ()
|
||
|
||
|
||
File: regex.info, Node: POSIX Pattern Buffers, Next: POSIX Regular Expression Compiling, Up: POSIX Regex Functions
|
||
|
||
POSIX Pattern Buffers
|
||
---------------------
|
||
|
||
To compile or match a given regular expression the POSIX way, you
|
||
must supply a pattern buffer exactly the way you do for GNU (*note GNU
|
||
Pattern Buffers::.). POSIX pattern buffers have type `regex_t', which
|
||
is equivalent to the GNU pattern buffer type `re_pattern_buffer'.
|
||
|
||
|
||
File: regex.info, Node: POSIX Regular Expression Compiling, Next: POSIX Matching, Prev: POSIX Pattern Buffers, Up: POSIX Regex Functions
|
||
|
||
POSIX Regular Expression Compiling
|
||
----------------------------------
|
||
|
||
With POSIX, you can only search for a given regular expression; you
|
||
can't match it. To do this, you must first compile it in a pattern
|
||
buffer, using `regcomp'.
|
||
|
||
To compile a pattern buffer, use:
|
||
|
||
int
|
||
regcomp (regex_t *PREG, const char *REGEX, int CFLAGS)
|
||
|
||
PREG is the initialized pattern buffer's address, REGEX is the regular
|
||
expression's address, and CFLAGS is the compilation flags, which Regex
|
||
considers as a collection of bits. Here are the valid bits, as defined
|
||
in `regex.h':
|
||
|
||
`REG_EXTENDED'
|
||
says to use POSIX Extended Regular Expression syntax; if this isn't
|
||
set, then says to use POSIX Basic Regular Expression syntax.
|
||
`regcomp' sets PREG's `syntax' field accordingly.
|
||
|
||
`REG_ICASE'
|
||
says to ignore case; `regcomp' sets PREG's `translate' field to a
|
||
translate table which ignores case, replacing anything you've put
|
||
there before.
|
||
|
||
`REG_NOSUB'
|
||
says to set PREG's `no_sub' field; *note POSIX Matching::., for
|
||
what this means.
|
||
|
||
`REG_NEWLINE'
|
||
says that a:
|
||
|
||
* match-any-character operator (*note Match-any-character
|
||
Operator::.) doesn't match a newline.
|
||
|
||
* nonmatching list not containing a newline (*note List
|
||
Operators::.) matches a newline.
|
||
|
||
* match-beginning-of-line operator (*note
|
||
Match-beginning-of-line Operator::.) matches the empty string
|
||
immediately after a newline, regardless of how `REG_NOTBOL'
|
||
is set (*note POSIX Matching::., for an explanation of
|
||
`REG_NOTBOL').
|
||
|
||
* match-end-of-line operator (*note Match-beginning-of-line
|
||
Operator::.) matches the empty string immediately before a
|
||
newline, regardless of how `REG_NOTEOL' is set (*note POSIX
|
||
Matching::., for an explanation of `REG_NOTEOL').
|
||
|
||
If `regcomp' successfully compiles the regular expression, it returns
|
||
zero and sets `*PATTERN_BUFFER' to the compiled pattern. Except for
|
||
`syntax' (which it sets as explained above), it also sets the same
|
||
fields the same way as does the GNU compiling function (*note GNU
|
||
Regular Expression Compiling::.).
|
||
|
||
If `regcomp' can't compile the regular expression, it returns one of
|
||
the error codes listed here. (Except when noted differently, the
|
||
syntax of in all examples below is basic regular expression syntax.)
|
||
|
||
`REG_BADRPT'
|
||
For example, the consecutive repetition operators `**' in `a**'
|
||
are invalid. As another example, if the syntax is extended
|
||
regular expression syntax, then the repetition operator `*' with
|
||
nothing on which to operate in `*' is invalid.
|
||
|
||
`REG_BADBR'
|
||
For example, the COUNT `-1' in `a\{-1' is invalid.
|
||
|
||
`REG_EBRACE'
|
||
For example, `a\{1' is missing a close-interval operator.
|
||
|
||
`REG_EBRACK'
|
||
For example, `[a' is missing a close-list operator.
|
||
|
||
`REG_ERANGE'
|
||
For example, the range ending point `z' that collates lower than
|
||
does its starting point `a' in `[z-a]' is invalid. Also, the
|
||
range with the character class `[:alpha:]' as its starting point in
|
||
`[[:alpha:]-|]'.
|
||
|
||
`REG_ECTYPE'
|
||
For example, the character class name `foo' in `[[:foo:]' is
|
||
invalid.
|
||
|
||
`REG_EPAREN'
|
||
For example, `a\)' is missing an open-group operator and `\(a' is
|
||
missing a close-group operator.
|
||
|
||
`REG_ESUBREG'
|
||
For example, the back reference `\2' that refers to a nonexistent
|
||
subexpression in `\(a\)\2' is invalid.
|
||
|
||
`REG_EEND'
|
||
Returned when a regular expression causes no other more specific
|
||
error.
|
||
|
||
`REG_EESCAPE'
|
||
For example, the trailing backslash `\' in `a\' is invalid, as is
|
||
the one in `\'.
|
||
|
||
`REG_BADPAT'
|
||
For example, in the extended regular expression syntax, the empty
|
||
group `()' in `a()b' is invalid.
|
||
|
||
`REG_ESIZE'
|
||
Returned when a regular expression needs a pattern buffer larger
|
||
than 65536 bytes.
|
||
|
||
`REG_ESPACE'
|
||
Returned when a regular expression makes Regex to run out of
|
||
memory.
|
||
|
||
|
||
File: regex.info, Node: POSIX Matching, Next: Reporting Errors, Prev: POSIX Regular Expression Compiling, Up: POSIX Regex Functions
|
||
|
||
POSIX Matching
|
||
--------------
|
||
|
||
Matching the POSIX way means trying to match a null-terminated string
|
||
starting at its first character. Once you've compiled a pattern into a
|
||
pattern buffer (*note POSIX Regular Expression Compiling::.), you can
|
||
ask the matcher to match that pattern against a string using:
|
||
|
||
int
|
||
regexec (const regex_t *PREG, const char *STRING,
|
||
size_t NMATCH, regmatch_t PMATCH[], int EFLAGS)
|
||
|
||
PREG is the address of a pattern buffer for a compiled pattern. STRING
|
||
is the string you want to match.
|
||
|
||
*Note Using Byte Offsets::, for an explanation of PMATCH. If you
|
||
pass zero for NMATCH or you compiled PREG with the compilation flag
|
||
`REG_NOSUB' set, then `regexec' will ignore PMATCH; otherwise, you must
|
||
allocate it to have at least NMATCH elements. `regexec' will record
|
||
NMATCH byte offsets in PMATCH, and set to -1 any unused elements up to
|
||
PMATCH`[NMATCH]' - 1.
|
||
|
||
EFLAGS specifies "execution flags"--namely, the two bits `REG_NOTBOL'
|
||
and `REG_NOTEOL' (defined in `regex.h'). If you set `REG_NOTBOL', then
|
||
the match-beginning-of-line operator (*note Match-beginning-of-line
|
||
Operator::.) always fails to match. This lets you match against pieces
|
||
of a line, as you would need to if, say, searching for repeated
|
||
instances of a given pattern in a line; it would work correctly for
|
||
patterns both with and without match-beginning-of-line operators.
|
||
`REG_NOTEOL' works analogously for the match-end-of-line operator
|
||
(*note Match-end-of-line Operator::.); it exists for symmetry.
|
||
|
||
`regexec' tries to find a match for PREG in STRING according to the
|
||
syntax in PREG's `syntax' field. (*Note POSIX Regular Expression
|
||
Compiling::, for how to set it.) The function returns zero if the
|
||
compiled pattern matches STRING and `REG_NOMATCH' (defined in
|
||
`regex.h') if it doesn't.
|
||
|
||
|
||
File: regex.info, Node: Reporting Errors, Next: Using Byte Offsets, Prev: POSIX Matching, Up: POSIX Regex Functions
|
||
|
||
Reporting Errors
|
||
----------------
|
||
|
||
If either `regcomp' or `regexec' fail, they return a nonzero error
|
||
code, the possibilities for which are defined in `regex.h'. *Note
|
||
POSIX Regular Expression Compiling::, and *Note POSIX Matching::, for
|
||
what these codes mean. To get an error string corresponding to these
|
||
codes, you can use:
|
||
|
||
size_t
|
||
regerror (int ERRCODE,
|
||
const regex_t *PREG,
|
||
char *ERRBUF,
|
||
size_t ERRBUF_SIZE)
|
||
|
||
ERRCODE is an error code, PREG is the address of the pattern buffer
|
||
which provoked the error, ERRBUF is the error buffer, and ERRBUF_SIZE
|
||
is ERRBUF's size.
|
||
|
||
`regerror' returns the size in bytes of the error string
|
||
corresponding to ERRCODE (including its terminating null). If ERRBUF
|
||
and ERRBUF_SIZE are nonzero, it also returns in ERRBUF the first
|
||
ERRBUF_SIZE - 1 characters of the error string, followed by a null.
|
||
eRRBUF_SIZE must be a nonnegative number less than or equal to the size
|
||
in bytes of ERRBUF.
|
||
|
||
You can call `regerror' with a null ERRBUF and a zero ERRBUF_SIZE to
|
||
determine how large ERRBUF need be to accommodate `regerror''s error
|
||
string.
|
||
|
||
|
||
File: regex.info, Node: Using Byte Offsets, Next: Freeing POSIX Pattern Buffers, Prev: Reporting Errors, Up: POSIX Regex Functions
|
||
|
||
Using Byte Offsets
|
||
------------------
|
||
|
||
In POSIX, variables of type `regmatch_t' hold analogous information,
|
||
but are not identical to, GNU's registers (*note Using Registers::.).
|
||
To get information about registers in POSIX, pass to `regexec' a
|
||
nonzero PMATCH of type `regmatch_t', i.e., the address of a structure
|
||
of this type, defined in `regex.h':
|
||
|
||
typedef struct
|
||
{
|
||
regoff_t rm_so;
|
||
regoff_t rm_eo;
|
||
} regmatch_t;
|
||
|
||
When reading in *Note Using Registers::, about how the matching
|
||
function stores the information into the registers, substitute PMATCH
|
||
for REGS, `PMATCH[I]->rm_so' for `REGS->start[I]' and
|
||
`PMATCH[I]->rm_eo' for `REGS->end[I]'.
|
||
|
||
|
||
File: regex.info, Node: Freeing POSIX Pattern Buffers, Prev: Using Byte Offsets, Up: POSIX Regex Functions
|
||
|
||
Freeing POSIX Pattern Buffers
|
||
-----------------------------
|
||
|
||
To free any allocated fields of a pattern buffer, use:
|
||
|
||
void
|
||
regfree (regex_t *PREG)
|
||
|
||
PREG is the pattern buffer whose allocated fields you want freed.
|
||
`regfree' also sets PREG's `allocated' and `used' fields to zero.
|
||
After freeing a pattern buffer, you need to again compile a regular
|
||
expression in it (*note POSIX Regular Expression Compiling::.) before
|
||
passing it to the matching function (*note POSIX Matching::.).
|
||
|
||
|
||
File: regex.info, Node: BSD Regex Functions, Prev: POSIX Regex Functions, Up: Programming with Regex
|
||
|
||
BSD Regex Functions
|
||
===================
|
||
|
||
If you're writing code that has to be Berkeley UNIX compatible,
|
||
you'll need to use these functions whose interfaces are the same as
|
||
those in Berkeley UNIX.
|
||
|
||
* Menu:
|
||
|
||
* BSD Regular Expression Compiling:: re_comp ()
|
||
* BSD Searching:: re_exec ()
|
||
|
||
|
||
File: regex.info, Node: BSD Regular Expression Compiling, Next: BSD Searching, Up: BSD Regex Functions
|
||
|
||
BSD Regular Expression Compiling
|
||
--------------------------------
|
||
|
||
With Berkeley UNIX, you can only search for a given regular
|
||
expression; you can't match one. To search for it, you must first
|
||
compile it. Before you compile it, you must indicate the regular
|
||
expression syntax you want it compiled according to by setting the
|
||
variable `re_syntax_options' (declared in `regex.h' to some syntax
|
||
(*note Regular Expression Syntax::.).
|
||
|
||
To compile a regular expression use:
|
||
|
||
char *
|
||
re_comp (char *REGEX)
|
||
|
||
REGEX is the address of a null-terminated regular expression.
|
||
`re_comp' uses an internal pattern buffer, so you can use only the most
|
||
recently compiled pattern buffer. This means that if you want to use a
|
||
given regular expression that you've already compiled--but it isn't the
|
||
latest one you've compiled--you'll have to recompile it. If you call
|
||
`re_comp' with the null string (*not* the empty string) as the
|
||
argument, it doesn't change the contents of the pattern buffer.
|
||
|
||
If `re_comp' successfully compiles the regular expression, it returns
|
||
zero. If it can't compile the regular expression, it returns an error
|
||
string. `re_comp''s error messages are identical to those of
|
||
`re_compile_pattern' (*note GNU Regular Expression Compiling::.).
|
||
|
||
|
||
File: regex.info, Node: BSD Searching, Prev: BSD Regular Expression Compiling, Up: BSD Regex Functions
|
||
|
||
BSD Searching
|
||
-------------
|
||
|
||
Searching the Berkeley UNIX way means searching in a string starting
|
||
at its first character and trying successive positions within it to
|
||
find a match. Once you've compiled a pattern using `re_comp' (*note
|
||
BSD Regular Expression Compiling::.), you can ask Regex to search for
|
||
that pattern in a string using:
|
||
|
||
int
|
||
re_exec (char *STRING)
|
||
|
||
STRING is the address of the null-terminated string in which you want
|
||
to search.
|
||
|
||
`re_exec' returns either 1 for success or 0 for failure. It
|
||
automatically uses a GNU fastmap (*note Searching with Fastmaps::.).
|
||
|
||
|
||
File: regex.info, Node: Copying, Next: Index, Prev: Programming with Regex, Up: Top
|
||
|
||
GNU GENERAL PUBLIC LICENSE
|
||
**************************
|
||
|
||
Version 2, June 1991
|
||
|
||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||
675 Mass Ave, Cambridge, MA 02139, USA
|
||
|
||
Everyone is permitted to copy and distribute verbatim copies
|
||
of this license document, but changing it is not allowed.
|
||
|
||
Preamble
|
||
========
|
||
|
||
The licenses for most software are designed to take away your freedom
|
||
to share and change it. By contrast, the GNU General Public License is
|
||
intended to guarantee your freedom to share and change free
|
||
software--to make sure the software is free for all its users. This
|
||
General Public License applies to most of the Free Software
|
||
Foundation's software and to any other program whose authors commit to
|
||
using it. (Some other Free Software Foundation software is covered by
|
||
the GNU Library General Public License instead.) You can apply it to
|
||
your programs, too.
|
||
|
||
When we speak of free software, we are referring to freedom, not
|
||
price. Our General Public Licenses are designed to make sure that you
|
||
have the freedom to distribute copies of free software (and charge for
|
||
this service if you wish), that you receive source code or can get it
|
||
if you want it, that you can change the software or use pieces of it in
|
||
new free programs; and that you know you can do these things.
|
||
|
||
To protect your rights, we need to make restrictions that forbid
|
||
anyone to deny you these rights or to ask you to surrender the rights.
|
||
These restrictions translate to certain responsibilities for you if you
|
||
distribute copies of the software, or if you modify it.
|
||
|
||
For example, if you distribute copies of such a program, whether
|
||
gratis or for a fee, you must give the recipients all the rights that
|
||
you have. You must make sure that they, too, receive or can get the
|
||
source code. And you must show them these terms so they know their
|
||
rights.
|
||
|
||
We protect your rights with two steps: (1) copyright the software, and
|
||
(2) offer you this license which gives you legal permission to copy,
|
||
distribute and/or modify the software.
|
||
|
||
Also, for each author's protection and ours, we want to make certain
|
||
that everyone understands that there is no warranty for this free
|
||
software. If the software is modified by someone else and passed on, we
|
||
want its recipients to know that what they have is not the original, so
|
||
that any problems introduced by others will not reflect on the original
|
||
authors' reputations.
|
||
|
||
Finally, any free program is threatened constantly by software
|
||
patents. We wish to avoid the danger that redistributors of a free
|
||
program will individually obtain patent licenses, in effect making the
|
||
program proprietary. To prevent this, we have made it clear that any
|
||
patent must be licensed for everyone's free use or not licensed at all.
|
||
|
||
The precise terms and conditions for copying, distribution and
|
||
modification follow.
|
||
|
||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||
|
||
1. This License applies to any program or other work which contains a
|
||
notice placed by the copyright holder saying it may be distributed
|
||
under the terms of this General Public License. The "Program",
|
||
below, refers to any such program or work, and a "work based on
|
||
the Program" means either the Program or any derivative work under
|
||
copyright law: that is to say, a work containing the Program or a
|
||
portion of it, either verbatim or with modifications and/or
|
||
translated into another language. (Hereinafter, translation is
|
||
included without limitation in the term "modification".) Each
|
||
licensee is addressed as "you".
|
||
|
||
Activities other than copying, distribution and modification are
|
||
not covered by this License; they are outside its scope. The act
|
||
of running the Program is not restricted, and the output from the
|
||
Program is covered only if its contents constitute a work based on
|
||
the Program (independent of having been made by running the
|
||
Program). Whether that is true depends on what the Program does.
|
||
|
||
2. You may copy and distribute verbatim copies of the Program's
|
||
source code as you receive it, in any medium, provided that you
|
||
conspicuously and appropriately publish on each copy an appropriate
|
||
copyright notice and disclaimer of warranty; keep intact all the
|
||
notices that refer to this License and to the absence of any
|
||
warranty; and give any other recipients of the Program a copy of
|
||
this License along with the Program.
|
||
|
||
You may charge a fee for the physical act of transferring a copy,
|
||
and you may at your option offer warranty protection in exchange
|
||
for a fee.
|
||
|
||
3. You may modify your copy or copies of the Program or any portion
|
||
of it, thus forming a work based on the Program, and copy and
|
||
distribute such modifications or work under the terms of Section 1
|
||
above, provided that you also meet all of these conditions:
|
||
|
||
a. You must cause the modified files to carry prominent notices
|
||
stating that you changed the files and the date of any change.
|
||
|
||
b. You must cause any work that you distribute or publish, that
|
||
in whole or in part contains or is derived from the Program
|
||
or any part thereof, to be licensed as a whole at no charge
|
||
to all third parties under the terms of this License.
|
||
|
||
c. If the modified program normally reads commands interactively
|
||
when run, you must cause it, when started running for such
|
||
interactive use in the most ordinary way, to print or display
|
||
an announcement including an appropriate copyright notice and
|
||
a notice that there is no warranty (or else, saying that you
|
||
provide a warranty) and that users may redistribute the
|
||
program under these conditions, and telling the user how to
|
||
view a copy of this License. (Exception: if the Program
|
||
itself is interactive but does not normally print such an
|
||
announcement, your work based on the Program is not required
|
||
to print an announcement.)
|
||
|
||
These requirements apply to the modified work as a whole. If
|
||
identifiable sections of that work are not derived from the
|
||
Program, and can be reasonably considered independent and separate
|
||
works in themselves, then this License, and its terms, do not
|
||
apply to those sections when you distribute them as separate
|
||
works. But when you distribute the same sections as part of a
|
||
whole which is a work based on the Program, the distribution of
|
||
the whole must be on the terms of this License, whose permissions
|
||
for other licensees extend to the entire whole, and thus to each
|
||
and every part regardless of who wrote it.
|
||
|
||
Thus, it is not the intent of this section to claim rights or
|
||
contest your rights to work written entirely by you; rather, the
|
||
intent is to exercise the right to control the distribution of
|
||
derivative or collective works based on the Program.
|
||
|
||
In addition, mere aggregation of another work not based on the
|
||
Program with the Program (or with a work based on the Program) on
|
||
a volume of a storage or distribution medium does not bring the
|
||
other work under the scope of this License.
|
||
|
||
4. You may copy and distribute the Program (or a work based on it,
|
||
under Section 2) in object code or executable form under the terms
|
||
of Sections 1 and 2 above provided that you also do one of the
|
||
following:
|
||
|
||
a. Accompany it with the complete corresponding machine-readable
|
||
source code, which must be distributed under the terms of
|
||
Sections 1 and 2 above on a medium customarily used for
|
||
software interchange; or,
|
||
|
||
b. Accompany it with a written offer, valid for at least three
|
||
years, to give any third party, for a charge no more than your
|
||
cost of physically performing source distribution, a complete
|
||
machine-readable copy of the corresponding source code, to be
|
||
distributed under the terms of Sections 1 and 2 above on a
|
||
medium customarily used for software interchange; or,
|
||
|
||
c. Accompany it with the information you received as to the offer
|
||
to distribute corresponding source code. (This alternative is
|
||
allowed only for noncommercial distribution and only if you
|
||
received the program in object code or executable form with
|
||
such an offer, in accord with Subsection b above.)
|
||
|
||
The source code for a work means the preferred form of the work for
|
||
making modifications to it. For an executable work, complete
|
||
source code means all the source code for all modules it contains,
|
||
plus any associated interface definition files, plus the scripts
|
||
used to control compilation and installation of the executable.
|
||
However, as a special exception, the source code distributed need
|
||
not include anything that is normally distributed (in either
|
||
source or binary form) with the major components (compiler,
|
||
kernel, and so on) of the operating system on which the executable
|
||
runs, unless that component itself accompanies the executable.
|
||
|
||
If distribution of executable or object code is made by offering
|
||
access to copy from a designated place, then offering equivalent
|
||
access to copy the source code from the same place counts as
|
||
distribution of the source code, even though third parties are not
|
||
compelled to copy the source along with the object code.
|
||
|
||
5. You may not copy, modify, sublicense, or distribute the Program
|
||
except as expressly provided under this License. Any attempt
|
||
otherwise to copy, modify, sublicense or distribute the Program is
|
||
void, and will automatically terminate your rights under this
|
||
License. However, parties who have received copies, or rights,
|
||
from you under this License will not have their licenses
|
||
terminated so long as such parties remain in full compliance.
|
||
|
||
6. You are not required to accept this License, since you have not
|
||
signed it. However, nothing else grants you permission to modify
|
||
or distribute the Program or its derivative works. These actions
|
||
are prohibited by law if you do not accept this License.
|
||
Therefore, by modifying or distributing the Program (or any work
|
||
based on the Program), you indicate your acceptance of this
|
||
License to do so, and all its terms and conditions for copying,
|
||
distributing or modifying the Program or works based on it.
|
||
|
||
7. Each time you redistribute the Program (or any work based on the
|
||
Program), the recipient automatically receives a license from the
|
||
original licensor to copy, distribute or modify the Program
|
||
subject to these terms and conditions. You may not impose any
|
||
further restrictions on the recipients' exercise of the rights
|
||
granted herein. You are not responsible for enforcing compliance
|
||
by third parties to this License.
|
||
|
||
8. If, as a consequence of a court judgment or allegation of patent
|
||
infringement or for any other reason (not limited to patent
|
||
issues), conditions are imposed on you (whether by court order,
|
||
agreement or otherwise) that contradict the conditions of this
|
||
License, they do not excuse you from the conditions of this
|
||
License. If you cannot distribute so as to satisfy simultaneously
|
||
your obligations under this License and any other pertinent
|
||
obligations, then as a consequence you may not distribute the
|
||
Program at all. For example, if a patent license would not permit
|
||
royalty-free redistribution of the Program by all those who
|
||
receive copies directly or indirectly through you, then the only
|
||
way you could satisfy both it and this License would be to refrain
|
||
entirely from distribution of the Program.
|
||
|
||
If any portion of this section is held invalid or unenforceable
|
||
under any particular circumstance, the balance of the section is
|
||
intended to apply and the section as a whole is intended to apply
|
||
in other circumstances.
|
||
|
||
It is not the purpose of this section to induce you to infringe any
|
||
patents or other property right claims or to contest validity of
|
||
any such claims; this section has the sole purpose of protecting
|
||
the integrity of the free software distribution system, which is
|
||
implemented by public license practices. Many people have made
|
||
generous contributions to the wide range of software distributed
|
||
through that system in reliance on consistent application of that
|
||
system; it is up to the author/donor to decide if he or she is
|
||
willing to distribute software through any other system and a
|
||
licensee cannot impose that choice.
|
||
|
||
This section is intended to make thoroughly clear what is believed
|
||
to be a consequence of the rest of this License.
|
||
|
||
9. If the distribution and/or use of the Program is restricted in
|
||
certain countries either by patents or by copyrighted interfaces,
|
||
the original copyright holder who places the Program under this
|
||
License may add an explicit geographical distribution limitation
|
||
excluding those countries, so that distribution is permitted only
|
||
in or among countries not thus excluded. In such case, this
|
||
License incorporates the limitation as if written in the body of
|
||
this License.
|
||
|
||
10. The Free Software Foundation may publish revised and/or new
|
||
versions of the General Public License from time to time. Such
|
||
new versions will be similar in spirit to the present version, but
|
||
may differ in detail to address new problems or concerns.
|
||
|
||
Each version is given a distinguishing version number. If the
|
||
Program specifies a version number of this License which applies
|
||
to it and "any later version", you have the option of following
|
||
the terms and conditions either of that version or of any later
|
||
version published by the Free Software Foundation. If the Program
|
||
does not specify a version number of this License, you may choose
|
||
any version ever published by the Free Software Foundation.
|
||
|
||
11. If you wish to incorporate parts of the Program into other free
|
||
programs whose distribution conditions are different, write to the
|
||
author to ask for permission. For software which is copyrighted
|
||
by the Free Software Foundation, write to the Free Software
|
||
Foundation; we sometimes make exceptions for this. Our decision
|
||
will be guided by the two goals of preserving the free status of
|
||
all derivatives of our free software and of promoting the sharing
|
||
and reuse of software generally.
|
||
|
||
NO WARRANTY
|
||
|
||
12. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
|
||
WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
|
||
LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
||
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
|
||
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
|
||
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
|
||
FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
|
||
QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
|
||
SERVICING, REPAIR OR CORRECTION.
|
||
|
||
13. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
|
||
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
|
||
MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
|
||
LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
|
||
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
|
||
INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
||
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
|
||
OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
|
||
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
|
||
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
||
|
||
END OF TERMS AND CONDITIONS
|
||
|
||
Appendix: How to Apply These Terms to Your New Programs
|
||
=======================================================
|
||
|
||
If you develop a new program, and you want it to be of the greatest
|
||
possible use to the public, the best way to achieve this is to make it
|
||
free software which everyone can redistribute and change under these
|
||
terms.
|
||
|
||
To do so, attach the following notices to the program. It is safest
|
||
to attach them to the start of each source file to most effectively
|
||
convey the exclusion of warranty; and each file should have at least
|
||
the "copyright" line and a pointer to where the full notice is found.
|
||
|
||
ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
|
||
Copyright (C) 19YY NAME OF AUTHOR
|
||
|
||
This program is free software; you can redistribute it and/or modify
|
||
it under the terms of the GNU General Public License as published by
|
||
the Free Software Foundation; either version 2 of the License, or
|
||
(at your option) any later version.
|
||
|
||
This program is distributed in the hope that it will be useful,
|
||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||
GNU General Public License for more details.
|
||
|
||
You should have received a copy of the GNU General Public License
|
||
along with this program; if not, write to the Free Software
|
||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||
|
||
Also add information on how to contact you by electronic and paper
|
||
mail.
|
||
|
||
If the program is interactive, make it output a short notice like this
|
||
when it starts in an interactive mode:
|
||
|
||
Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
|
||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||
This is free software, and you are welcome to redistribute it
|
||
under certain conditions; type `show c' for details.
|
||
|
||
The hypothetical commands `show w' and `show c' should show the
|
||
appropriate parts of the General Public License. Of course, the
|
||
commands you use may be called something other than `show w' and `show
|
||
c'; they could even be mouse-clicks or menu items--whatever suits your
|
||
program.
|
||
|
||
You should also get your employer (if you work as a programmer) or
|
||
your school, if any, to sign a "copyright disclaimer" for the program,
|
||
if necessary. Here is a sample; alter the names:
|
||
|
||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||
|
||
SIGNATURE OF TY COON, 1 April 1989
|
||
Ty Coon, President of Vice
|
||
|
||
This General Public License does not permit incorporating your
|
||
program into proprietary programs. If your program is a subroutine
|
||
library, you may consider it more useful to permit linking proprietary
|
||
applications with the library. If this is what you want to do, use the
|
||
GNU Library General Public License instead of this License.
|
||
|
||
|
||
File: regex.info, Node: Index, Prev: Copying, Up: Top
|
||
|
||
Index
|
||
*****
|
||
|
||
* Menu:
|
||
|
||
* $: Match-end-of-line Operator.
|
||
* (: Grouping Operators.
|
||
* ): Grouping Operators.
|
||
* *: Match-zero-or-more Operator.
|
||
* +: Match-one-or-more Operator.
|
||
* -: List Operators.
|
||
* .: Match-any-character Operator.
|
||
* :] in regex: Character Class Operators.
|
||
* ?: Match-zero-or-one Operator.
|
||
* {: Interval Operators.
|
||
* }: Interval Operators.
|
||
* [: in regex: Character Class Operators.
|
||
* [^: List Operators.
|
||
* [: List Operators.
|
||
* \': Match-end-of-buffer Operator.
|
||
* \<: Match-beginning-of-word Operator.
|
||
* \>: Match-end-of-word Operator.
|
||
* \{: Interval Operators.
|
||
* \}: Interval Operators.
|
||
* \b: Match-word-boundary Operator.
|
||
* \B: Match-within-word Operator.
|
||
* \s: Match-syntactic-class Operator.
|
||
* \S: Match-not-syntactic-class Operator.
|
||
* \w: Match-word-constituent Operator.
|
||
* \W: Match-non-word-constituent Operator.
|
||
* \`: Match-beginning-of-buffer Operator.
|
||
* \: List Operators.
|
||
* ]: List Operators.
|
||
* ^: List Operators.
|
||
* allocated initialization: GNU Regular Expression Compiling.
|
||
* alternation operator: Alternation Operator.
|
||
* alternation operator and ^: Match-beginning-of-line Operator.
|
||
* anchoring: Anchoring Operators.
|
||
* anchors: Match-end-of-line Operator.
|
||
* anchors: Match-beginning-of-line Operator.
|
||
* Awk: Predefined Syntaxes.
|
||
* back references: Back-reference Operator.
|
||
* backtracking: Match-zero-or-more Operator.
|
||
* backtracking: Alternation Operator.
|
||
* beginning-of-line operator: Match-beginning-of-line Operator.
|
||
* bracket expression: List Operators.
|
||
* buffer field, set by re_compile_pattern: GNU Regular Expression Compiling.
|
||
* buffer initialization: GNU Regular Expression Compiling.
|
||
* character classes: Character Class Operators.
|
||
* Egrep: Predefined Syntaxes.
|
||
* Emacs: Predefined Syntaxes.
|
||
* end in struct re_registers: Using Registers.
|
||
* end-of-line operator: Match-end-of-line Operator.
|
||
* fastmap initialization: GNU Regular Expression Compiling.
|
||
* fastmaps: Searching with Fastmaps.
|
||
* fastmap_accurate field, set by re_compile_pattern: GNU Regular Expression Compiling.
|
||
* Grep: Predefined Syntaxes.
|
||
* grouping: Grouping Operators.
|
||
* ignoring case: POSIX Regular Expression Compiling.
|
||
* interval expression: Interval Operators.
|
||
* matching list: List Operators.
|
||
* matching newline: List Operators.
|
||
* matching with GNU functions: GNU Matching.
|
||
* newline_anchor field in pattern buffer: Match-beginning-of-line Operator.
|
||
* nonmatching list: List Operators.
|
||
* not_bol field in pattern buffer: Match-beginning-of-line Operator.
|
||
* num_regs in struct re_registers: Using Registers.
|
||
* open-group operator and ^: Match-beginning-of-line Operator.
|
||
* or operator: Alternation Operator.
|
||
* parenthesizing: Grouping Operators.
|
||
* pattern buffer initialization: GNU Regular Expression Compiling.
|
||
* pattern buffer, definition of: GNU Pattern Buffers.
|
||
* POSIX Awk: Predefined Syntaxes.
|
||
* range argument to re_search: GNU Searching.
|
||
* regex.c: Overview.
|
||
* regex.h: Overview.
|
||
* regexp anchoring: Anchoring Operators.
|
||
* regmatch_t: Using Byte Offsets.
|
||
* regs_allocated: Using Registers.
|
||
* REGS_FIXED: Using Registers.
|
||
* REGS_REALLOCATE: Using Registers.
|
||
* REGS_UNALLOCATED: Using Registers.
|
||
* regular expressions, syntax of: Regular Expression Syntax.
|
||
* REG_EXTENDED: POSIX Regular Expression Compiling.
|
||
* REG_ICASE: POSIX Regular Expression Compiling.
|
||
* REG_NEWLINE: POSIX Regular Expression Compiling.
|
||
* REG_NOSUB: POSIX Regular Expression Compiling.
|
||
* RE_BACKSLASH_ESCAPE_IN_LIST: Syntax Bits.
|
||
* RE_BK_PLUS_QM: Syntax Bits.
|
||
* RE_CHAR_CLASSES: Syntax Bits.
|
||
* RE_CONTEXT_INDEP_ANCHORS: Syntax Bits.
|
||
* RE_CONTEXT_INDEP_ANCHORS (and ^): Match-beginning-of-line Operator.
|
||
* RE_CONTEXT_INDEP_OPS: Syntax Bits.
|
||
* RE_CONTEXT_INVALID_OPS: Syntax Bits.
|
||
* RE_DOT_NEWLINE: Syntax Bits.
|
||
* RE_DOT_NOT_NULL: Syntax Bits.
|
||
* RE_INTERVALS: Syntax Bits.
|
||
* RE_LIMITED_OPS: Syntax Bits.
|
||
* RE_NEWLINE_ALT: Syntax Bits.
|
||
* RE_NO_BK_BRACES: Syntax Bits.
|
||
* RE_NO_BK_PARENS: Syntax Bits.
|
||
* RE_NO_BK_REFS: Syntax Bits.
|
||
* RE_NO_BK_VBAR: Syntax Bits.
|
||
* RE_NO_EMPTY_RANGES: Syntax Bits.
|
||
* re_nsub field, set by re_compile_pattern: GNU Regular Expression Compiling.
|
||
* re_pattern_buffer definition: GNU Pattern Buffers.
|
||
* re_registers: Using Registers.
|
||
* re_syntax_options initialization: GNU Regular Expression Compiling.
|
||
* RE_UNMATCHED_RIGHT_PAREN_ORD: Syntax Bits.
|
||
* searching with GNU functions: GNU Searching.
|
||
* start argument to re_search: GNU Searching.
|
||
* start in struct re_registers: Using Registers.
|
||
* struct re_pattern_buffer definition: GNU Pattern Buffers.
|
||
* subexpressions: Grouping Operators.
|
||
* syntax field, set by re_compile_pattern: GNU Regular Expression Compiling.
|
||
* syntax bits: Syntax Bits.
|
||
* syntax initialization: GNU Regular Expression Compiling.
|
||
* syntax of regular expressions: Regular Expression Syntax.
|
||
* translate initialization: GNU Regular Expression Compiling.
|
||
* used field, set by re_compile_pattern: GNU Regular Expression Compiling.
|
||
* word boundaries, matching: Match-word-boundary Operator.
|
||
* \: The Backslash Character.
|
||
* \(: Grouping Operators.
|
||
* \): Grouping Operators.
|
||
* \|: Alternation Operator.
|
||
* ^: Match-beginning-of-line Operator.
|
||
* |: Alternation Operator.
|
||
|
||
|
||
|
||
Tag Table:
|
||
Node: Top1064
|
||
Node: Overview4562
|
||
Node: Regular Expression Syntax6746
|
||
Node: Syntax Bits7916
|
||
Node: Predefined Syntaxes14018
|
||
Node: Collating Elements vs. Characters17872
|
||
Node: The Backslash Character18835
|
||
Node: Common Operators21992
|
||
Node: Match-self Operator23445
|
||
Node: Match-any-character Operator23941
|
||
Node: Concatenation Operator24520
|
||
Node: Repetition Operators25017
|
||
Node: Match-zero-or-more Operator25436
|
||
Node: Match-one-or-more Operator27483
|
||
Node: Match-zero-or-one Operator28341
|
||
Node: Interval Operators29196
|
||
Node: Alternation Operator30991
|
||
Node: List Operators32489
|
||
Node: Character Class Operators35272
|
||
Node: Range Operator36901
|
||
Node: Grouping Operators38930
|
||
Node: Back-reference Operator40251
|
||
Node: Anchoring Operators43073
|
||
Node: Match-beginning-of-line Operator43447
|
||
Node: Match-end-of-line Operator44779
|
||
Node: GNU Operators45518
|
||
Node: Word Operators45767
|
||
Node: Non-Emacs Syntax Tables46391
|
||
Node: Match-word-boundary Operator47465
|
||
Node: Match-within-word Operator47858
|
||
Node: Match-beginning-of-word Operator48255
|
||
Node: Match-end-of-word Operator48588
|
||
Node: Match-word-constituent Operator48908
|
||
Node: Match-non-word-constituent Operator49234
|
||
Node: Buffer Operators49545
|
||
Node: Match-beginning-of-buffer Operator49952
|
||
Node: Match-end-of-buffer Operator50264
|
||
Node: GNU Emacs Operators50558
|
||
Node: Syntactic Class Operators50901
|
||
Node: Emacs Syntax Tables51307
|
||
Node: Match-syntactic-class Operator51963
|
||
Node: Match-not-syntactic-class Operator52560
|
||
Node: What Gets Matched?53150
|
||
Node: Programming with Regex53799
|
||
Node: GNU Regex Functions54237
|
||
Node: GNU Pattern Buffers55078
|
||
Node: GNU Regular Expression Compiling58303
|
||
Node: GNU Matching61181
|
||
Node: GNU Searching63101
|
||
Node: Matching/Searching with Split Data64913
|
||
Node: Searching with Fastmaps66369
|
||
Node: GNU Translate Tables68921
|
||
Node: Using Registers70892
|
||
Node: Freeing GNU Pattern Buffers77000
|
||
Node: POSIX Regex Functions77593
|
||
Node: POSIX Pattern Buffers78266
|
||
Node: POSIX Regular Expression Compiling78709
|
||
Node: POSIX Matching82836
|
||
Node: Reporting Errors84791
|
||
Node: Using Byte Offsets86048
|
||
Node: Freeing POSIX Pattern Buffers86861
|
||
Node: BSD Regex Functions87467
|
||
Node: BSD Regular Expression Compiling87886
|
||
Node: BSD Searching89258
|
||
Node: Copying89960
|
||
Node: Index109122
|
||
|
||
End Tag Table
|