Bugs have been found in the fastmatch implementation as used in bsdgrep.
Some have been fixed (r316495) while fixes for others are in review
(D10098).
In comparison with the fastmatch implementation, Kyle Evans found that:
- regex(3)'s performance with literal expressions offers a speed
improvement over fastmatch
- regex(3)'s performance, both with simple BREs and EREs, seems to be
comparable
The regex implementation was imported in r226035, and the commit message
reports:
This is a temporary solution until the whole regex library is
not replaced so that BSD grep development can continue and the
backported code gets some review and testing. This change only
improves scalability slightly, there is no big performance boost
yet but several minor bugs have been found and fixed.
Introduce a WITH_/WITHOUT_BSD_GREP_FASTMATCH knob to support testing
of both approaches.
PR: 175314, 194823
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: bdrewery (in part)
Differential Revision: https://reviews.freebsd.org/D10282
r316477 broke zero-length matches when not using the -o flag, by
skipping over them entirely.
Add a regression test so that it doesn't break again in the future.
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem emaste ngie
Differential Revision: https://reviews.freebsd.org/D10333
Make bsdgrep more sensitive to context overlaps. If it's printing
context that either overlaps or is immediately adjacent to another bit
of context, don't print a separator.
- Non-overlapping segments no longer have two separators between them
- Overlapping segments no longer have separators between them with
overlapping sections repeated
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D10105
This is more sensible than the previous behaviour of grepping stdin,
and matches newer GNU grep behaviour.
PR: 216307
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem, emaste, ngie
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/
-z treats input and output data as sequences of lines terminated by a
zero byte instead of a newline. This brings it more in line with GNU grep
and brings us closer to passing the current tests with BSD grep.
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D10101
r316477 changed the color output to match exactly the in-tree GNU grep,
but introduces unnecessary escape sequences.
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reported by: ache
MFC after: 1 month
MFC with: r316477
- Set REG_NOTBOL if we've already matched beginning of line and we're
examining later parts
- For each pattern we examine, apply it to the remaining bits of the
line rather than (potentially) smaller subsets
- Check for REG_NOSUB after we've looked at all patterns initially
matching the line
- Keep track of the last match we made to later determine if we're
simply not matching any longer or if we need to proceed another byte
because we hit a zero-length match
- Match the earliest and longest bit of each line before moving the
beginning of what we match to further in the line, past the end of the
longest match; this generally matches how gnugrep(1) seems to behave,
and seems like pretty good behavior to me
- Finally, bail out of printing any matches if we were set to print all
(empty pattern) but -o (output matches) was set
PR: 195763, 180990, 197555, 197531, 181263, 209116
Submitted by: "Kyle Evans" <kevans91@ksu.edu>
Reviewed by: cem
MFC after: 1 month
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D10104
Pull a copy of the filename string before calling basename(). Change the
loop to not return on its own, so we can put a free() statement at the
bottom.
detected.
Certain criteria must be met for this bug to show up:
* the -w flag is specified, and
* neither -o or --color are specified, and
* the pattern is part of another word in the line, and
* the other word that contains the pattern occurs first
PR: 181973
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
backported that was written for the TRE integration project in Google
Summer of Code 2011. This is a temporary solution until the whole
regex library is not replaced so that BSD grep development can continue
and the backported code gets some review and testing. This change only
improves scalability slightly, there is no big performance boost yet
but several minor bugs have been found and fixed.
Approved by: delphij (mentor)
Sposored by: Google Summer of Code 2011
MFC after: 1 week
- Make -F and -w work together
- Fix --color to colorize all of the matches
PR: bin/156826
Submitted by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: delphij (mentor)
- Makefile nit
- Add more CVS/SVN keywords to make it easier to track changes from NetBSD
in case they add further improvements
Approved by: delphij (mentor)
Obtained from: The NetBSD Project
instead of stdio. This gives BSD grep a very big performance boost,
its speed is now almost comparable to GNU grep.
Submitted by: Dimitry Andric <dimitry@andric.com>
Approved by: delphij (mentor)
former may be safer but in this case it doesn't add extra
safety [1]
- Fix -w option [2]
- Fix handling of GREP_OPTIONS [3]
- Fix --line-buffered
- Make stdin input imply --line-buffered so that tail -f can be piped
to grep [4]
- Imply -h if single file is grepped, this is the GNU behaviour
- Reduce locking overhead to gain some more performance [5]
- Inline some functions to help the compiler better optimize the code
- Use shortcut for empty files [6]
PR: bin/149425 [6]
Prodded by: jilles [1]
Reported by: Alex Kozlov <spam@rm-rf.kiev.ua> [2] [3],
swell.k@gmail.com [2],
poyopoyo@puripuri.plala.or.jp [4]
Submitted by: scf [5],
Shuichi KITAGUCHI <ki@hh.iij4u.or.jp> [6]
Approved by: delphij (mentor)
and exclusion patterns [1]
- Some improvements on the exiting code, like replacing memcpy with
strlcpy/strcpy
Approved by: delphij (mentor)
Pointed out by: bf [1], des [1]
or if forced mode is specified [1]
- While here, add some alternative names for the options and make then
case-insensitive
- Fix -q and -l behaviour [2]
- Some small changes to make the code easier to review
Submitted by: swell.k@gmail.com [1],
dougb [2]
Approved by: delphij (mentor)
- Explicitly pre-zero memory for fts_open parameters.
- Don't test against directory patterns when we are testing direct
leaf of current directory.
While I'm there plug a few of memory leaks.
Deliverables: Small and clean code (1,4 KSLOC vs GNU's 8,5 KSLOC),
lower memory usage than GNU grep, GNU compatibility,
BSD license.
TODO: Performance is somewhat behind GNU grep but it is only
significant for bigger searches. The reason is complex, the
most important factor is that GNU grep uses lots of
optimizations to improve the speed of the regex library.
First, we need a modern regex library (practically by adopting
TRE), add support for GNU-style non-standard regexes and then
reevalute the performance issues and look for bottlenecks. In
the meantime, for those, who need better performance, it is
possible to build GNU grep by setting WITH_GNU_GREP.
Approved by: delphij (mentor)
Obtained from: OpenBSD (http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/),
freegrep (http://github.com/howardjp/freegrep)
Sponsored by: Google SoC 2008
Portbuild tests run by: kris, pav, erwin
Acknowledgements to: fjoe (as SoC 2008 mentor),
everyone who helped in reviewing and testing