Commit Graph

53 Commits

Author SHA1 Message Date
Yuri Pankov
14fdf16371 sed: treat '[' as ordinary character in 'y' command
'y' does not handle bracket expressions, treat '[' as ordinary character
and do not apply bracket expression checks (GNU sed agrees).

PR:		247931
Reviewed by:	pfg, kevans
Tested by:	antoine (exp-run), Quentin L'Hours <lhoursquentin@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D25640
2020-07-26 09:15:05 +00:00
Kyle Evans
3a4d70c588 sed: attempt to learn about hex escapes (e.g. \x27)
Somewhat predictably, software often wants to use \x27/\x24 among others so
that they can decline worrying about ugly escaping, if said escaping is even
possible. Right now, this software is using these and getting the wrong
results, as we'll interpret those as x27 and x24 respectively. Some examples
of this, when an exp-run was ran, were science/octopus and misc/vifm.

Go ahead and process these at all times.  We allow either one or two digits,
and the tests account for both.  If extra digits are specified, e.g. \x2727,
then the third and fourth digits are interpreted literally as one might
expect.

PR:		229925
MFC after:	2 weeks
2020-06-07 04:32:38 +00:00
Kyle Evans
6e816d8711 sed: process \r, \n, and \t
This is both reasonable and a common GNUism that a lot of ported software
expects.

Universally process \r, \n, and \t into carriage return, newline, and tab
respectively. Newline still doesn't function in contexts where it can't
(e.g. BRE), but we process it anyways rather than passing
UB \n (escaped ordinary) through to the underlying regex engine.

Adding a --posix flag to disable these was considered, but sed.1 already
declares this version of sed a super-set of POSIX specification and this
behavior is the most likely expected when one attempts to use one of these
escape sequences in pattern space.

This differs from pre-r197362 behavior in that we now honor the three
arguably most common escape sequences used with sed(1) and we do so outside
of character classes, too.

Other escape sequences, like \s and \S, will come later when GNU extensions
are added to libregex; sed will likely link against libregex by default,
since the GNU extensions tend to be fairly un-intrusive.

PR:		229925
Reviewed by:	bapt, emaste, pfg
Differential Revision:	https://reviews.freebsd.org/D22750
2019-12-10 19:16:00 +00:00
Pedro F. Giffuni
43daed4774 Revert r337419.
The fix is only partial and causes an asymmetry which breaks a test in
multi_test.sh.

We should consider both parts of the issue found in OpenBSD[1], but for now
just revert the change.

[1] http://undeadly.org/cgi?action=article;sid=20180728110010

Reported by: asomers
2018-08-16 18:35:39 +00:00
Pedro F. Giffuni
5102499601 sed(1): partial fix for the case of the regex delimited with '['.
We don't generally support the weird case of regular expresions delimited
by an opening square bracket ('[') but POSIX says that inside
bracket expressions, escaping is not possible and both '[' and '\'
represent themselves.

PR:		230198 (exp-run)
Obtained from:	OpenBSD
2018-08-07 14:47:39 +00:00
Pedro F. Giffuni
8a16b7a18f General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Pedro F. Giffuni
b780b03cbe sed(1): add LEGACY_BSDSED_COMPAT compile-time flag.
In r297602, which included a __FreeBSD_version bump to 1100105, we changed
sed 'i' and 'a' from discarding whitespaces to conform with what GNU and
sysvish sed do.

There are arguments in favor of keeping the old behavior but the new
behavior is also useful for migration purposes. It seems important to at
least consider the case of developers depending on the previous behavior,
so add a CFLAG to enable the old behaviour.

PR:		213474
MFC after:	5 days
2016-11-04 20:49:59 +00:00
Pedro F. Giffuni
249a8a8003 sed(1): Revert r303047 "cleanup" and therefore r303572.
While big, the change was meant to have no effect on behavior and instead
so far we have found two regressions: one in the etcupdate tests and
another one in the games/openttd port[1].

Revert to a known working state. We will likely have to split the patch in
functional parts before bringing back the changes.

PR:		195929
Reported by:	danfe, madpilot [1]
2016-08-02 15:35:53 +00:00
Enji Cooper
131cac80e1 Explicitly test for cu_fgets returning NULL or !NULL
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
2016-07-30 04:40:44 +00:00
Pedro F. Giffuni
b4932d18ee sed(1): Assorted cleanups and simplifications.
Const-ify several variables, make it build cleanly with WARNS level 5.

Submitted by:	mi
PR:		195929
MFC after:	1 month
2016-07-19 22:56:40 +00:00
Pedro F. Giffuni
e55a6575d9 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-21 21:30:51 +00:00
Pedro F. Giffuni
678fec509d Fix sed functions 'i' and 'a' from discarding leading white space.
This appears to be implementation dependent but convenient and makes
our sed behave more like GNU sed.

Given that it is not the historic behavior, bump FreeBSD_version
should userland/ports somehow depend on it.

Obtained from:	NetBSD (bin/49872)

Reviewed by:	bdrewery
PR:		208554
Merge after:	NEVER
2016-04-06 00:55:39 +00:00
Eitan Adler
463a577b27 Fix a ton of speelling errors
arc lint is helpful

Reviewed By: allanjude, wblock, #manpages, chris@bsdjunk.com
Differential Revision: https://reviews.freebsd.org/D3337
2015-10-21 05:37:09 +00:00
Pedro F. Giffuni
a34b42e43c sed: Bounds check the file path used in the 'w' command.
Modified version of a diff from Sebastien Marie to prevent a crash found
with the afl fuzzer.

Obtained from:	OpenBSD (CVS Rev. 1.37)
MFC after:	1 week
2014-12-16 20:26:11 +00:00
Eitan Adler
49e8901465 Per the resolution of POSIX bug 0000779 (note 0002050) add support for using 'i'
as a case insensitive flag.

PR:		standards/184641
Requested by:	David A. Wheeler <dwheeler@dwheeler.com>
MFC After:	1 week
2013-12-09 18:57:20 +00:00
Diomidis Spinellis
ac8f32ce62 IEEE Std 1003.1, 2004 Edition states:
"The escape sequence '\n' shall match a <newline> embedded in
the pattern space."

It is unclear whether this also applies to a \n embedded in a
character class.  Disable the existing handling of \n in a character
class following Mac OS X, GNU sed version 4.1.5 with --posix, and
SunOS 5.10 /usr/bin/sed.

Pointed by:	Marius Strobl
Obtained from:	Mac OS X
2009-09-20 15:47:31 +00:00
Diomidis Spinellis
76570d0a99 Follow POSIX (IEEE Std 1003.1, 2004 Edition) in the implementation
of the y (translate) command.

"If a backslash character is immediately followed by a backslash
character in string1 or string2, the two backslash characters shall
be counted as a single literal backslash character"

Pointed by:	Marius Strobl
Obtained from:	Mac OS X
2009-09-20 15:17:40 +00:00
Diomidis Spinellis
128e6a12b5 Allow [ to be used as a delimiter.
Pointed by:	Marius Strobl
Obtained from:	Apple
2009-09-20 14:11:33 +00:00
Brian Somers
f879e8d923 Implement "addr1,+N" ranges - not dissimilar to grep's -A switch.
PR:		134856
Submitted by:	Jeremie Le Hen - jeremie at le-hen dot org
2009-05-25 06:45:33 +00:00
Diomidis Spinellis
46da6c4869 Fix the code to conform to the "or more" part of the following POSIX
specification and regression test regress:25.

  "A function can be preceded by one or more '!' characters, in which
  case the function shall be applied if the addresses do not select
  the pattern space."

MFC after:	2 weeks
2008-11-11 17:15:57 +00:00
Hiroki Sato
d3d0d3a3b5 Add workaround for a back reference when no corresponding
parenthesized subexpression is defined.  For example, the
following command line caused unexpected behavior like
segmentation fault:

 % echo test | sed -e 's/test/\1/'

PR:		bin/126682
MFC after:	1 week
2008-11-09 01:10:21 +00:00
David Malone
d3e5e11ce6 WARNS fixes:
1) Add missing parens around assignment that is compared to zero.
2) Make some variables that only take non-negative values unsigned.
3) Some casts/type changes to fix other constness warnings.
4) Make one variable a const char *.
5) Make sure termwidth is positive, it doesn't make sense for it to be negative.

Approved by:	dds
2008-02-09 09:12:02 +00:00
Xin LI
870945d830 Before doing compile_re() which needs a parameter to identify
whether we should ignore case, determine the flag by calling
compile_flags() first.  Also, make sure that we obtain an
initialized cmd->u.s buffer before processing further.  We
may want to refine this solution later, but for now, make
the changes in order to unbreak world build after a sed(1)
with rev. 1.29 of compile.c is installed.

Approved by:	re (hrs)
2007-07-06 16:34:56 +00:00
Suleiman Souhlal
bdd72b703b Add case-insensitive matching to sed, using the 'I' flag, similarly to GNU sed.
For example,
	sed /foo/Id
	sed s/foo/bar/Ig

Reviewed by:	dds
Approved by:	re (hrs)
2007-07-04 16:42:41 +00:00
Diomidis Spinellis
d432588e78 Bug fix: a numeric flag specification in the substitute command would
cause the next substitute flag to be ignored.
While working at it, detect and report overflows.

Reported by:	Jingsong Liu
MFC after:	1 week
2005-08-04 10:05:12 +00:00
Stefan Farfeleder
3dc2fe25dc Fix dubious C code construct. 2005-03-09 11:57:32 +00:00
Diomidis Spinellis
30fd73fb81 Per letter dated July 22, 1999 remove 3rd clause of Berkeley derived
software (original contributor).

Reviewed by:	imp
2004-08-09 15:29:41 +00:00
Tim J. Robbins
81a8648adb Make the 'y' (translate) command aware of multibyte characters. 2004-07-14 10:06:22 +00:00
Dag-Erling Smørgrav
9cde9a2e85 Whitespace cleanup 2003-11-04 12:16:47 +00:00
Dag-Erling Smørgrav
e6478125c8 ANSIfy 2003-11-04 12:15:20 +00:00
Tim J. Robbins
0467aed3c9 Ignore leading semicolons on commands; required by SUSv3.
Obtained from:	NetBSD (kleink, Aymeric Vincent)
2002-07-30 14:07:30 +00:00
Brian Feldman
f020c7fa88 Fix a bug in sed(1)'s "s" command wherein if an escape ("\" character)
was initiated at the last character of the line buffer, the Wrong
Thing was done and sed barfed by interpreting the following NUL byte
as a digit.  Instead, pull up the next buffer and record that the "\"
was last seen.
2002-06-01 13:25:47 +00:00
Maxim Sobolev
efaed24f7d Fix an ages-old bug in sed(1), which resulted in the absolutely valid
substitution expressions in the form `s,[fooexp],[barexp],;...' treated
as invalid when the third `,' is (_POSIX2_LINE_MAX * N)-th character in
the line.

MFC after:	2 weeks
2002-04-12 19:46:05 +00:00
Warner Losh
3f330d7d1a remove __P 2002-03-22 01:42:45 +00:00
Mark Murray
e74bf75f1c WARNS=2 partial fix; use NO_WERROR to protect against some hard-to-fix warnings.
Use __FBSDID(), kill register keyword.
2001-12-12 23:20:16 +00:00
Mike Heffner
0ea56610b4 Don't allocate a zero byte segment.
PR:		bin/11900
MFC after:	2 weeks
2001-11-08 16:47:05 +00:00
David E. O'Brien
8e33c0a0f6 Expand xmalloc in-place, along with xrealloc; which wasn't even ANSI in its
implementation.
2001-07-24 14:05:21 +00:00
Ruslan Ermilov
94ccf5741c Don't leak memory when compiling text following the a', c' or `i' command.
Testcase:

echo FOO | sed "/FOO/c\\
`jot -b 'aaaa\' 500`"

Submitted by:	Max Khon <fjoe@newst.net>
2001-05-18 09:48:17 +00:00
Brian Feldman
175de1e677 Add a new flag: -E enables "extended" regular expressions. 2000-03-19 19:41:53 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Archie Cobbs
c56690ef7b Fix a new bug introduced by the previous bug fix 1998-12-08 21:29:22 +00:00
Archie Cobbs
1a6583da3a Fix brokenness compiling "s/pat/subst/" when length of subst is >= 4090 chars.
PR:		bin/7939
1998-12-07 05:35:54 +00:00
Brian Somers
13ede3c083 Terminate our output string correctly if we've got
an ``a'' command that has an escaped newline on the
last line of the last script that we're processing.

This fixes exmh2/scripts/build when /etc/malloc.conf -> AJ
1998-09-22 18:39:47 +00:00
Philippe Charnier
73a08bb2ad Remove local redefinition for err(). Add usage(). 1997-08-11 07:21:08 +00:00
Andrey A. Chernov
726aebe5e0 Localize it
8bit cleanup
1996-08-11 17:46:35 +00:00
Bruce Evans
49e6559936 Yet^2 another fix for the line continuation bug.
The fundamental problem with the original code is that it accesses
p[-2] which is one before the beginning of the input buffer for
empty lines.  rev.1.6 just moved the problem from failures when
p[-2] happens to be '\\' to failures when it happens to be '\0'.
rev.1.5 was confused about the trailing newline and other things.

I went back to rev.1.5 and fixed it.  The result is the same as
Keith Bostic's final version in PR 1356 except it loses more
gracefully for excessively long input lines.
1996-07-17 12:18:51 +00:00
David Greenman
4f58776644 Yet another fix for the line continuation bug in sed. Keith's patch
introduced a new bug. This fix appears to work correctly. Fixes PR#1350.

Submitted by:	mark@linus.demon.co.uk (Mark Valentine)
1996-06-26 05:54:32 +00:00
David Greenman
796d43185d Fix from Keith Bostic <bostic@bsdi.com> for bug in sed dealing with
continuation lines.

Submitted by:	Keith Bostic via Kirk McKusick
1996-06-19 11:20:07 +00:00
Jordan K. Hubbard
ce19262d8c Merge various fixes from NetBSD. This will allow the WordPerfect for
SCO installation to run all the way through (some POSIX fixes).
1995-08-16 05:56:44 +00:00