MFC r300683:
libc: regexec(3) adjustment. Change the behavior of when REG_STARTEND is combined with REG_NOTBOL. From the original posting[1]: "Enable the assumption that pmatch[0].rm_so is a continuation offset to a string and allows us to do a proper assessment of the character in regards to it's word position ('^' or '\<'), without risking going into unallocated memory." This change makes us similar to how glibc handles REG_STARTEND | REG_NOTBOL, and is closely related to a soon-to-land fix to sed. Special thanks to Martijn van Duren and Ingo Schwarze for working out some consistent behaviour. Differential Revision: https://reviews.freebsd.org/D6257 Taken from: openbsd-tech 2016-05-24 [1] (Martijn van Duren)
This commit is contained in:
parent
c45c824e6b
commit
33e55d7e3f
@ -786,7 +786,7 @@ fast( struct match *m,
|
||||
ASSIGN(fresh, st);
|
||||
SP("start", st, *p);
|
||||
coldp = NULL;
|
||||
if (start == m->beginp)
|
||||
if (start == m->offp || (start == m->beginp && !(m->eflags®_NOTBOL)))
|
||||
c = OUT;
|
||||
else {
|
||||
/*
|
||||
@ -891,7 +891,7 @@ slow( struct match *m,
|
||||
SP("sstart", st, *p);
|
||||
st = step(m->g, startst, stopst, st, NOTHING, st);
|
||||
matchp = NULL;
|
||||
if (start == m->beginp)
|
||||
if (start == m->offp || (start == m->beginp && !(m->eflags®_NOTBOL)))
|
||||
c = OUT;
|
||||
else {
|
||||
/*
|
||||
|
@ -32,7 +32,7 @@
|
||||
.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd August 17, 2005
|
||||
.Dd May 25, 2016
|
||||
.Dt REGEX 3
|
||||
.Os
|
||||
.Sh NAME
|
||||
@ -235,11 +235,16 @@ The
|
||||
argument is the bitwise OR of zero or more of the following flags:
|
||||
.Bl -tag -width REG_STARTEND
|
||||
.It Dv REG_NOTBOL
|
||||
The first character of
|
||||
the string
|
||||
is not the beginning of a line, so the
|
||||
.Ql ^\&
|
||||
anchor should not match before it.
|
||||
The first character of the string is treated as the continuation
|
||||
of a line.
|
||||
This means that the anchors
|
||||
.Ql ^\& ,
|
||||
.Ql [[:<:]] ,
|
||||
and
|
||||
.Ql \e<
|
||||
do not match before it; but see
|
||||
.Dv REG_STARTEND
|
||||
below.
|
||||
This does not affect the behavior of newlines under
|
||||
.Dv REG_NEWLINE .
|
||||
.It Dv REG_NOTEOL
|
||||
@ -247,19 +252,16 @@ The NUL terminating
|
||||
the string
|
||||
does not end a line, so the
|
||||
.Ql $\&
|
||||
anchor should not match before it.
|
||||
anchor does not match before it.
|
||||
This does not affect the behavior of newlines under
|
||||
.Dv REG_NEWLINE .
|
||||
.It Dv REG_STARTEND
|
||||
The string is considered to start at
|
||||
.Fa string
|
||||
+
|
||||
.Fa pmatch Ns [0]. Ns Va rm_so
|
||||
and to have a terminating NUL located at
|
||||
.Fa string
|
||||
+
|
||||
.Fa pmatch Ns [0]. Ns Va rm_eo
|
||||
(there need not actually be a NUL at that location),
|
||||
.Fa string No +
|
||||
.Fa pmatch Ns [0]. Ns Fa rm_so
|
||||
and to end before the byte located at
|
||||
.Fa string No +
|
||||
.Fa pmatch Ns [0]. Ns Fa rm_eo ,
|
||||
regardless of the value of
|
||||
.Fa nmatch .
|
||||
See below for the definition of
|
||||
@ -271,13 +273,37 @@ compatible with but not specified by
|
||||
.St -p1003.2 ,
|
||||
and should be used with
|
||||
caution in software intended to be portable to other systems.
|
||||
Note that a non-zero
|
||||
.Va rm_so
|
||||
does not imply
|
||||
.Dv REG_NOTBOL ;
|
||||
.Dv REG_STARTEND
|
||||
affects only the location of the string,
|
||||
not how it is matched.
|
||||
.Pp
|
||||
Without
|
||||
.Dv REG_NOTBOL ,
|
||||
the position
|
||||
.Fa rm_so
|
||||
is considered the beginning of a line, such that
|
||||
.Ql ^
|
||||
matches before it, and the beginning of a word if there is a word
|
||||
character at this position, such that
|
||||
.Ql [[:<:]]
|
||||
and
|
||||
.Ql \e<
|
||||
match before it.
|
||||
.Pp
|
||||
With
|
||||
.Dv REG_NOTBOL ,
|
||||
the character at position
|
||||
.Fa rm_so
|
||||
is treated as the continuation of a line, and if
|
||||
.Fa rm_so
|
||||
is greater than 0, the preceding character is taken into consideration.
|
||||
If the preceding character is a newline and the regular expression was compiled
|
||||
with
|
||||
.Dv REG_NEWLINE ,
|
||||
.Ql ^
|
||||
matches before the string; if the preceding character is not a word character
|
||||
but the string starts with a word character,
|
||||
.Ql [[:<:]]
|
||||
and
|
||||
.Ql \e<
|
||||
match before the string.
|
||||
.El
|
||||
.Pp
|
||||
See
|
||||
|
Loading…
x
Reference in New Issue
Block a user