Commit Graph

53 Commits

Author SHA1 Message Date
yuripv
69509d8e92 sed: treat '[' as ordinary character in 'y' command
'y' does not handle bracket expressions, treat '[' as ordinary character
and do not apply bracket expression checks (GNU sed agrees).

PR:		247931
Reviewed by:	pfg, kevans
Tested by:	antoine (exp-run), Quentin L'Hours <lhoursquentin@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D25640
2020-07-26 09:15:05 +00:00
kevans
4ece12e555 sed: attempt to learn about hex escapes (e.g. \x27)
Somewhat predictably, software often wants to use \x27/\x24 among others so
that they can decline worrying about ugly escaping, if said escaping is even
possible. Right now, this software is using these and getting the wrong
results, as we'll interpret those as x27 and x24 respectively. Some examples
of this, when an exp-run was ran, were science/octopus and misc/vifm.

Go ahead and process these at all times.  We allow either one or two digits,
and the tests account for both.  If extra digits are specified, e.g. \x2727,
then the third and fourth digits are interpreted literally as one might
expect.

PR:		229925
MFC after:	2 weeks
2020-06-07 04:32:38 +00:00
kevans
3a58a22a0b sed: process \r, \n, and \t
This is both reasonable and a common GNUism that a lot of ported software
expects.

Universally process \r, \n, and \t into carriage return, newline, and tab
respectively. Newline still doesn't function in contexts where it can't
(e.g. BRE), but we process it anyways rather than passing
UB \n (escaped ordinary) through to the underlying regex engine.

Adding a --posix flag to disable these was considered, but sed.1 already
declares this version of sed a super-set of POSIX specification and this
behavior is the most likely expected when one attempts to use one of these
escape sequences in pattern space.

This differs from pre-r197362 behavior in that we now honor the three
arguably most common escape sequences used with sed(1) and we do so outside
of character classes, too.

Other escape sequences, like \s and \S, will come later when GNU extensions
are added to libregex; sed will likely link against libregex by default,
since the GNU extensions tend to be fairly un-intrusive.

PR:		229925
Reviewed by:	bapt, emaste, pfg
Differential Revision:	https://reviews.freebsd.org/D22750
2019-12-10 19:16:00 +00:00
pfg
5d437294cc Revert r337419.
The fix is only partial and causes an asymmetry which breaks a test in
multi_test.sh.

We should consider both parts of the issue found in OpenBSD[1], but for now
just revert the change.

[1] http://undeadly.org/cgi?action=article;sid=20180728110010

Reported by: asomers
2018-08-16 18:35:39 +00:00
pfg
962d5382ee sed(1): partial fix for the case of the regex delimited with '['.
We don't generally support the weird case of regular expresions delimited
by an opening square bracket ('[') but POSIX says that inside
bracket expressions, escaping is not possible and both '[' and '\'
represent themselves.

PR:		230198 (exp-run)
Obtained from:	OpenBSD
2018-08-07 14:47:39 +00:00
pfg
872b698bd4 General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
imp
7e6cabd06e Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
pfg
0f62b80bb2 sed(1): add LEGACY_BSDSED_COMPAT compile-time flag.
In r297602, which included a __FreeBSD_version bump to 1100105, we changed
sed 'i' and 'a' from discarding whitespaces to conform with what GNU and
sysvish sed do.

There are arguments in favor of keeping the old behavior but the new
behavior is also useful for migration purposes. It seems important to at
least consider the case of developers depending on the previous behavior,
so add a CFLAG to enable the old behaviour.

PR:		213474
MFC after:	5 days
2016-11-04 20:49:59 +00:00
pfg
b8ad49048a sed(1): Revert r303047 "cleanup" and therefore r303572.
While big, the change was meant to have no effect on behavior and instead
so far we have found two regressions: one in the etcupdate tests and
another one in the games/openttd port[1].

Revert to a known working state. We will likely have to split the patch in
functional parts before bringing back the changes.

PR:		195929
Reported by:	danfe, madpilot [1]
2016-08-02 15:35:53 +00:00
ngie
cedf419a7b Explicitly test for cu_fgets returning NULL or !NULL
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
2016-07-30 04:40:44 +00:00
pfg
0b44b26db4 sed(1): Assorted cleanups and simplifications.
Const-ify several variables, make it build cleanly with WARNS level 5.

Submitted by:	mi
PR:		195929
MFC after:	1 month
2016-07-19 22:56:40 +00:00
pfg
3f525b8a34 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-21 21:30:51 +00:00
pfg
753486d87a Fix sed functions 'i' and 'a' from discarding leading white space.
This appears to be implementation dependent but convenient and makes
our sed behave more like GNU sed.

Given that it is not the historic behavior, bump FreeBSD_version
should userland/ports somehow depend on it.

Obtained from:	NetBSD (bin/49872)

Reviewed by:	bdrewery
PR:		208554
Merge after:	NEVER
2016-04-06 00:55:39 +00:00
eadler
21a3003f8f Fix a ton of speelling errors
arc lint is helpful

Reviewed By: allanjude, wblock, #manpages, chris@bsdjunk.com
Differential Revision: https://reviews.freebsd.org/D3337
2015-10-21 05:37:09 +00:00
pfg
5daaead431 sed: Bounds check the file path used in the 'w' command.
Modified version of a diff from Sebastien Marie to prevent a crash found
with the afl fuzzer.

Obtained from:	OpenBSD (CVS Rev. 1.37)
MFC after:	1 week
2014-12-16 20:26:11 +00:00
eadler
3fc4115992 Per the resolution of POSIX bug 0000779 (note 0002050) add support for using 'i'
as a case insensitive flag.

PR:		standards/184641
Requested by:	David A. Wheeler <dwheeler@dwheeler.com>
MFC After:	1 week
2013-12-09 18:57:20 +00:00
dds
5ce4473a06 IEEE Std 1003.1, 2004 Edition states:
"The escape sequence '\n' shall match a <newline> embedded in
the pattern space."

It is unclear whether this also applies to a \n embedded in a
character class.  Disable the existing handling of \n in a character
class following Mac OS X, GNU sed version 4.1.5 with --posix, and
SunOS 5.10 /usr/bin/sed.

Pointed by:	Marius Strobl
Obtained from:	Mac OS X
2009-09-20 15:47:31 +00:00
dds
668711df00 Follow POSIX (IEEE Std 1003.1, 2004 Edition) in the implementation
of the y (translate) command.

"If a backslash character is immediately followed by a backslash
character in string1 or string2, the two backslash characters shall
be counted as a single literal backslash character"

Pointed by:	Marius Strobl
Obtained from:	Mac OS X
2009-09-20 15:17:40 +00:00
dds
96199e6b70 Allow [ to be used as a delimiter.
Pointed by:	Marius Strobl
Obtained from:	Apple
2009-09-20 14:11:33 +00:00
brian
d2e1d02aee Implement "addr1,+N" ranges - not dissimilar to grep's -A switch.
PR:		134856
Submitted by:	Jeremie Le Hen - jeremie at le-hen dot org
2009-05-25 06:45:33 +00:00
dds
83407db259 Fix the code to conform to the "or more" part of the following POSIX
specification and regression test regress:25.

  "A function can be preceded by one or more '!' characters, in which
  case the function shall be applied if the addresses do not select
  the pattern space."

MFC after:	2 weeks
2008-11-11 17:15:57 +00:00
hrs
9fe9cedf42 Add workaround for a back reference when no corresponding
parenthesized subexpression is defined.  For example, the
following command line caused unexpected behavior like
segmentation fault:

 % echo test | sed -e 's/test/\1/'

PR:		bin/126682
MFC after:	1 week
2008-11-09 01:10:21 +00:00
dwmalone
61bc7e9048 WARNS fixes:
1) Add missing parens around assignment that is compared to zero.
2) Make some variables that only take non-negative values unsigned.
3) Some casts/type changes to fix other constness warnings.
4) Make one variable a const char *.
5) Make sure termwidth is positive, it doesn't make sense for it to be negative.

Approved by:	dds
2008-02-09 09:12:02 +00:00
delphij
668252a162 Before doing compile_re() which needs a parameter to identify
whether we should ignore case, determine the flag by calling
compile_flags() first.  Also, make sure that we obtain an
initialized cmd->u.s buffer before processing further.  We
may want to refine this solution later, but for now, make
the changes in order to unbreak world build after a sed(1)
with rev. 1.29 of compile.c is installed.

Approved by:	re (hrs)
2007-07-06 16:34:56 +00:00
ssouhlal
e75185f7f5 Add case-insensitive matching to sed, using the 'I' flag, similarly to GNU sed.
For example,
	sed /foo/Id
	sed s/foo/bar/Ig

Reviewed by:	dds
Approved by:	re (hrs)
2007-07-04 16:42:41 +00:00
dds
7380480109 Bug fix: a numeric flag specification in the substitute command would
cause the next substitute flag to be ignored.
While working at it, detect and report overflows.

Reported by:	Jingsong Liu
MFC after:	1 week
2005-08-04 10:05:12 +00:00
stefanf
b4856acd04 Fix dubious C code construct. 2005-03-09 11:57:32 +00:00
dds
145dad6e9d Per letter dated July 22, 1999 remove 3rd clause of Berkeley derived
software (original contributor).

Reviewed by:	imp
2004-08-09 15:29:41 +00:00
tjr
b7f5e217dd Make the 'y' (translate) command aware of multibyte characters. 2004-07-14 10:06:22 +00:00
des
bc082b44cb Whitespace cleanup 2003-11-04 12:16:47 +00:00
des
b91f0f9009 ANSIfy 2003-11-04 12:15:20 +00:00
tjr
e465cc4382 Ignore leading semicolons on commands; required by SUSv3.
Obtained from:	NetBSD (kleink, Aymeric Vincent)
2002-07-30 14:07:30 +00:00
green
20552d4d83 Fix a bug in sed(1)'s "s" command wherein if an escape ("\" character)
was initiated at the last character of the line buffer, the Wrong
Thing was done and sed barfed by interpreting the following NUL byte
as a digit.  Instead, pull up the next buffer and record that the "\"
was last seen.
2002-06-01 13:25:47 +00:00
sobomax
08c080ac1d Fix an ages-old bug in sed(1), which resulted in the absolutely valid
substitution expressions in the form `s,[fooexp],[barexp],;...' treated
as invalid when the third `,' is (_POSIX2_LINE_MAX * N)-th character in
the line.

MFC after:	2 weeks
2002-04-12 19:46:05 +00:00
imp
0b20191705 remove __P 2002-03-22 01:42:45 +00:00
markm
f7397edc4d WARNS=2 partial fix; use NO_WERROR to protect against some hard-to-fix warnings.
Use __FBSDID(), kill register keyword.
2001-12-12 23:20:16 +00:00
mikeh
571866fa14 Don't allocate a zero byte segment.
PR:		bin/11900
MFC after:	2 weeks
2001-11-08 16:47:05 +00:00
obrien
6dc73139c0 Expand xmalloc in-place, along with xrealloc; which wasn't even ANSI in its
implementation.
2001-07-24 14:05:21 +00:00
ru
4048b83188 Don't leak memory when compiling text following the a', c' or `i' command.
Testcase:

echo FOO | sed "/FOO/c\\
`jot -b 'aaaa\' 500`"

Submitted by:	Max Khon <fjoe@newst.net>
2001-05-18 09:48:17 +00:00
green
f556f75f09 Add a new flag: -E enables "extended" regular expressions. 2000-03-19 19:41:53 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
archie
e11c48f4b1 Fix a new bug introduced by the previous bug fix 1998-12-08 21:29:22 +00:00
archie
70bef8d209 Fix brokenness compiling "s/pat/subst/" when length of subst is >= 4090 chars.
PR:		bin/7939
1998-12-07 05:35:54 +00:00
brian
a642c78f62 Terminate our output string correctly if we've got
an ``a'' command that has an escaped newline on the
last line of the last script that we're processing.

This fixes exmh2/scripts/build when /etc/malloc.conf -> AJ
1998-09-22 18:39:47 +00:00
charnier
db6e9b0f81 Remove local redefinition for err(). Add usage(). 1997-08-11 07:21:08 +00:00
ache
4fbce9eec9 Localize it
8bit cleanup
1996-08-11 17:46:35 +00:00
bde
8c325c27d2 Yet^2 another fix for the line continuation bug.
The fundamental problem with the original code is that it accesses
p[-2] which is one before the beginning of the input buffer for
empty lines.  rev.1.6 just moved the problem from failures when
p[-2] happens to be '\\' to failures when it happens to be '\0'.
rev.1.5 was confused about the trailing newline and other things.

I went back to rev.1.5 and fixed it.  The result is the same as
Keith Bostic's final version in PR 1356 except it loses more
gracefully for excessively long input lines.
1996-07-17 12:18:51 +00:00
dg
db69ec2618 Yet another fix for the line continuation bug in sed. Keith's patch
introduced a new bug. This fix appears to work correctly. Fixes PR#1350.

Submitted by:	mark@linus.demon.co.uk (Mark Valentine)
1996-06-26 05:54:32 +00:00
dg
707c90b7b6 Fix from Keith Bostic <bostic@bsdi.com> for bug in sed dealing with
continuation lines.

Submitted by:	Keith Bostic via Kirk McKusick
1996-06-19 11:20:07 +00:00
jkh
529c9b015a Merge various fixes from NetBSD. This will allow the WordPerfect for
SCO installation to run all the way through (some POSIX fixes).
1995-08-16 05:56:44 +00:00