libc: regexec(3) adjustment.

Change the behavior of when REG_STARTEND is combined with REG_NOTBOL.

From the original posting[1]:

"Enable the assumption that pmatch[0].rm_so is a continuation offset
to  a string and allows us to do a proper assessment of the character
in  regards to it's word position ('^' or '\<'), without risking going
into unallocated memory."

This change makes us similar to how glibc handles REG_STARTEND |
REG_NOTBOL, and is closely related to a soon-to-land fix to sed.

Special thanks to Martijn van Duren and Ingo Schwarze for working
out some consistent behaviour.

Differential Revision:	https://reviews.freebsd.org/D6257
Taken from:	openbsd-tech 2016-05-24 [1]  (Martijn van Duren)
Relnotes:	yes
MFC after:	1 month
This commit is contained in:
Pedro F. Giffuni 2016-05-25 15:35:23 +00:00
parent b7b46892f9
commit 93ea9f9fa1
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=300683
2 changed files with 50 additions and 24 deletions

View File

@ -786,7 +786,7 @@ fast( struct match *m,
ASSIGN(fresh, st);
SP("start", st, *p);
coldp = NULL;
if (start == m->beginp)
if (start == m->offp || (start == m->beginp && !(m->eflags&REG_NOTBOL)))
c = OUT;
else {
/*
@ -891,7 +891,7 @@ slow( struct match *m,
SP("sstart", st, *p);
st = step(m->g, startst, stopst, st, NOTHING, st);
matchp = NULL;
if (start == m->beginp)
if (start == m->offp || (start == m->beginp && !(m->eflags&REG_NOTBOL)))
c = OUT;
else {
/*

View File

@ -32,7 +32,7 @@
.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
.\" $FreeBSD$
.\"
.Dd August 17, 2005
.Dd May 25, 2016
.Dt REGEX 3
.Os
.Sh NAME
@ -235,11 +235,16 @@ The
argument is the bitwise OR of zero or more of the following flags:
.Bl -tag -width REG_STARTEND
.It Dv REG_NOTBOL
The first character of
the string
is not the beginning of a line, so the
.Ql ^\&
anchor should not match before it.
The first character of the string is treated as the continuation
of a line.
This means that the anchors
.Ql ^\& ,
.Ql [[:<:]] ,
and
.Ql \e<
do not match before it; but see
.Dv REG_STARTEND
below.
This does not affect the behavior of newlines under
.Dv REG_NEWLINE .
.It Dv REG_NOTEOL
@ -247,19 +252,16 @@ The NUL terminating
the string
does not end a line, so the
.Ql $\&
anchor should not match before it.
anchor does not match before it.
This does not affect the behavior of newlines under
.Dv REG_NEWLINE .
.It Dv REG_STARTEND
The string is considered to start at
.Fa string
+
.Fa pmatch Ns [0]. Ns Va rm_so
and to have a terminating NUL located at
.Fa string
+
.Fa pmatch Ns [0]. Ns Va rm_eo
(there need not actually be a NUL at that location),
.Fa string No +
.Fa pmatch Ns [0]. Ns Fa rm_so
and to end before the byte located at
.Fa string No +
.Fa pmatch Ns [0]. Ns Fa rm_eo ,
regardless of the value of
.Fa nmatch .
See below for the definition of
@ -271,13 +273,37 @@ compatible with but not specified by
.St -p1003.2 ,
and should be used with
caution in software intended to be portable to other systems.
Note that a non-zero
.Va rm_so
does not imply
.Dv REG_NOTBOL ;
.Dv REG_STARTEND
affects only the location of the string,
not how it is matched.
.Pp
Without
.Dv REG_NOTBOL ,
the position
.Fa rm_so
is considered the beginning of a line, such that
.Ql ^
matches before it, and the beginning of a word if there is a word
character at this position, such that
.Ql [[:<:]]
and
.Ql \e<
match before it.
.Pp
With
.Dv REG_NOTBOL ,
the character at position
.Fa rm_so
is treated as the continuation of a line, and if
.Fa rm_so
is greater than 0, the preceding character is taken into consideration.
If the preceding character is a newline and the regular expression was compiled
with
.Dv REG_NEWLINE ,
.Ql ^
matches before the string; if the preceding character is not a word character
but the string starts with a word character,
.Ql [[:<:]]
and
.Ql \e<
match before the string.
.El
.Pp
See