Commit Graph

27 Commits

Author SHA1 Message Date
kevans
d56f0ccdf6 grep: replace the internal queue with a ring buffer
We know up front how many items we can have in the queue (-B/Bflag), so
pay the cost of those particular allocations early on.

The reduced queue maintenance overhead seemed to yield about an ~8%
improvement for my earlier `grep -C8 -r closefrom .` test.

MFC after:	2 weeks
2020-12-09 05:27:45 +00:00
kevans
cc9b40180f bsdgrep(1): Slooowly peel away the chunky onion
(or peel off the band-aid, whatever floats your boat)

This addresses two separate issues:

1.) Nothing within bsdgrep actually knew whether it cared about line numbers
  or not.

2.) The file layer knew nothing about the context in which it was being
  called.

#1 is only important when we're *not* processing line-by-line. #2 is
debatably a good idea; the parsing context is only handy because that's
where we store current offset information and, as of this commit, whether or
not it needs to be line-aware.
2018-06-08 01:25:07 +00:00
kevans
91ec7ef2a1 bsdgrep(1): Do some less dirty things with return types
Neither procfile nor grep_tree return anything meaningful to their callers.
None of the callers actually care about how many lines were matched in all
of the files they processed; it's all about "did anything match?"

This is generally just a light refactoring to remind me of what actually
matters as I'm rewriting these bits to care less about 'stuff'.
2018-06-07 18:27:58 +00:00
bapt
83eb83a386 Remove NLS support from BSD grep
GNU grep as in actually in base does not have any translations support
compiled in, so no functionnality loss.

We do support 193 locales in base, we will never catch up on that number of
translation with bsd grep.

Removing NLS support make bsd grep consistent with the other binaries in base
which are not translated, and also reduce a little bit the code.

Reviewed by:	kevans
Approved by:	kevans
Discussed with:	kevans @BSDCan
Differential Revision:	https://reviews.freebsd.org/D15682
2018-06-06 23:12:35 +00:00
kevans
da653eef52 bsdgrep: annihilate our in-tree TRE, previously disabled by default
It was an old TRE that had plenty of bugs and no performance gain over
regex(3). I disabled it by default in r323615, and there was some confusion
about what the knob does- likely due to poor naming on my part- to the tune
of "well, it sounds like it should speed things up" (mentioned by multiple
people).

To compound this, I have no intention of maintaining a second regex
implementation. If someone would like to step up and volunteer to maintain a
lean-and-mean implementation for grep, this is OK, but we have very few
volunteers to maintain even our primary regex implementation.
2018-05-04 03:13:25 +00:00
bapt
a4e032fde9 Remove compression support from bsdgrep
Compression support is now handled by an external script, remove it from the
bsdgrep(1) utility.
This removes the support for -Z -J -X and -M

Note: that it matches the changes in newer GNU grep

Reviewed by:	kevans
Approved by:	kevans
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D15197
2018-04-25 14:40:15 +00:00
pfg
7551d83c35 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
kevans
17df4eb706 bsdgrep: add a primitive literal matcher
fgrep/grep -F will error out at runtime if compiled with a regex(3)
that does not define REG_NOSPEC or REG_LITERAL. glibc is one such regex(3)
implementation, and as it turns out they don't support literal matching at
all.

Provide a primitive literal matcher for use with glibc and other
implementations that don't support literal matching so that we don't
completely lose fgrep/grep -F if compiled against libgnuregex on stable/10,
stable/11, or other systems that we don't necessarily support.

This is a wholly unoptimized implementation with no plans to optimize it as
of now. This is due to both its use-case being primarily on unsupported
systems in the near-distant future and that it's reinventing the wheel that
we already have available as a feature of regex(3).

Reviewed by:	cem, emaste, ngie
Approved by:	emaste (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12056
2017-08-24 01:23:33 +00:00
emaste
2298c6d8b6 bsdgrep: bump version number and add Kyle Evans copyright
The following changes have been made over the last couple of months:

Features:

 - With bsdgrep -r, the working directory is implied if no directory is
   specified
 - bsdgrep will now behave as bsdgrep -r does when it's named rgrep
 - bsdgrep now understands -z/--null-data to use \0 as EOL
 - GNU regex compatibility is now indicated with a "GNU compatible" in
   the version string

Fixes:

 - --mmap no longer hangs when coming across an EOF without an
   accompanying EOL
 - -o/--color matching generally improved, now produces earliest /
   longest matches
 - Context output now more closely aligns with GNU grep
 - Zero-length matches no longer exhibit broken behavior
 - Every output line now honors -b/-H/-n flags

Tests have been added for previous regressions as well as other
previously untested behaviors.

Various other fixes have been commited, and refactoring for further /
later improvements has taken place.

(The original submission changed the version string to 2.5.2, but I
decided to use 2.6.0 to reflect the addition of new features.)

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Differential Revision:	https://reviews.freebsd.org/D10982
2017-05-29 13:10:01 +00:00
emaste
6b50dcbb8c bsdgrep: correct assumptions to prepare for chunking
Correct a couple of minor BSD grep assumptions that are valid for line
processing but not future chunk-based processing.

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reviewed by:	bapt, cem
Differential Revision:	https://reviews.freebsd.org/D10824
2017-05-26 02:30:26 +00:00
emaste
5249e4567c bsdgrep: Correct per-line line metadata printing
Metadata printing with -b, -H, or -n flags suffered from a few flaws:

1) -b/offset printing was broken when used in conjunction with -o

2) With -o, bsdgrep did not print metadata for every match/line, just
   the first match of a line

3) There were no tests for this

Address these issues by outputting this data per-match if the -o flag is
specified, and prior to outputting any matches if -o but not --color,
since --color alone will not generate a new line of output for every
iteration over the matches.

To correct -b output, fudge the line offset as we're printing matches.

While here, make sure we're using grep_printline in -A context.  Context
printing should *never* look at the parsing context, just the line.

The tests included do not pass with gnugrep in base due to it exhibiting
similar quirky behavior that bsdgrep previously exhibited.

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D10580
2017-05-20 11:20:03 +00:00
emaste
a32ff2cabf bsdgrep: don't allow negative -A / -B / -C
Previously, when given a negative -A/-B/-C argument bsdgrep would
overflow the respective context flag(s) and exhibited surprising
behavior.

Fix this by removing unsignedness of Aflag/Bflag and erroring out if
we're given a value < 0.  Also adjust the type used to track 'tail'
context in procfile() so that it accurately reflects the Aflag value
rather than overflowing and losing trailing context.

This also fixes an inconsistency previously existing between -n and
-C "n" behavior.  They are now both limited to LLONG_MAX, to be
consistent.

Add some test cases to make sure grep errors out properly for both
negative context values as well as non-numeric context values rather
than giving bogus matches.

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D10675
2017-05-15 17:51:01 +00:00
emaste
e028666ef0 bsdgrep: fix -w flag matching with an empty pattern
-w flag matching with an empty pattern was generally 'broken', allowing
matches to occur on any line whether or not it actually matches -w
criteria.

This fix required a good amount of refactoring to address.  procline()
is altered to *only* process the line and return whether it was a match
or not, necessary to be able to short-circuit the whole function in case
of this matchall flag. -m flag handling is moved out as well because it
suffers from the same fate as context handling if we bypass any actual
pattern matching.

The matching context (matches, mostly) didn't previously exist outside
of procline(), so we go ahead and create context object for file
processing bits to pass around.  grep_printline() was created due to
this, for the scenarios where the matches don't actually matter and we
just want to print a line or two, a la flushing the context queue and
no -o or --color specified.

Damage from this broken behavior would have been mitigated by the fact
that it is unlikely users would invoke grep -w with an empty pattern.

This was identified while checking PR 105221 for problems it this may
cause in BSD grep, but PR 105221 is *not* a report of this behavior.

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Differential Revision:	https://reviews.freebsd.org/D10433
2017-05-02 20:39:33 +00:00
emaste
dd5e4bc8dd bsdgrep: add BSD_GREP_FASTMATCH knob for built-in fastmatch
Bugs have been found in the fastmatch implementation as used in bsdgrep.
Some have been fixed (r316495) while fixes for others are in review
(D10098).

In comparison with the fastmatch implementation, Kyle Evans found that:

- regex(3)'s performance with literal expressions offers a speed
  improvement over fastmatch

- regex(3)'s performance, both with simple BREs and EREs, seems to be
  comparable

The regex implementation was imported in r226035, and the commit message
reports:

    This is a temporary solution until the whole regex library is
    not replaced so that BSD grep development can continue and the
    backported code gets some review and testing. This change only
    improves scalability slightly, there is no big performance boost
    yet but several minor bugs have been found and fixed.

Introduce a WITH_/WITHOUT_BSD_GREP_FASTMATCH knob to support testing
of both approaches.

PR:		175314, 194823
Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	bdrewery (in part)
Differential Revision:	https://reviews.freebsd.org/D10282
2017-04-21 14:36:09 +00:00
emaste
cf9dd707cc bsdgrep: add -z/--null-data support
-z treats input and output data as sequences of lines terminated by a
zero byte instead of a newline. This brings it more in line with GNU grep
and brings us closer to passing the current tests with BSD grep.

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	cem
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D10101
2017-04-17 13:14:18 +00:00
pfg
2a89efd87b Various style(9) fixes and typos in grep, sort and patch.
MFC after:	3 days
2014-04-21 22:52:18 +00:00
eadler
7e7d24dbbf Make bsdgrep behave as gnugrep and as documented: -m should only stop
reading the specific file, not any file.

Tested by:	frogs (irc)
Reviewed by:	gabor
Approved by:	cperciva (implicit)
MFC after:	1 week
2012-12-20 17:38:14 +00:00
gabor
ecc4a991f3 - Match GNU behavior of exit code
- Rename variable that has a different meaning now

PR:		bin/162930
Submitted by:	Jan Beich <jbeich@tormail.net>
MFC after:	1 week
2011-12-07 12:25:28 +00:00
gabor
1cb16d9887 Update BSD grep to the latest development version. It has some code
backported that was written for the TRE integration project in Google
Summer of Code 2011.  This is a temporary solution until the whole
regex library is not replaced so that BSD grep development can continue
and the backported code gets some review and testing.  This change only
improves scalability slightly, there is no big performance boost yet
but several minor bugs have been found and fixed.

Approved by:	delphij (mentor)
Sposored by:	Google Summer of Code 2011
MFC after:	1 week
2011-10-05 09:56:43 +00:00
gabor
43c556e7f5 - Adjust a comment to actual behaviour
- Makefile nit
- Add more CVS/SVN keywords to make it easier to track changes from NetBSD
  in case they add further improvements

Approved by:	delphij (mentor)
Obtained from:	The NetBSD Project
2011-04-07 13:03:35 +00:00
gabor
43d1fb6126 - Simplify the fixed string pattern preprocessing code
- Improve readability

Approved by:	delphij (mentor)
Obtained from:	The NetBSD Project
2011-04-07 13:01:03 +00:00
des
2dc227ede0 UTFize my name. 2010-08-19 09:28:59 +00:00
gabor
fb7d7246f6 - Refactor file reading code to use pure syscalls and an internal buffer
instead of stdio.  This gives BSD grep a very big performance boost,
  its speed is now almost comparable to GNU grep.

Submitted by:	Dimitry Andric <dimitry@andric.com>
Approved by:	delphij (mentor)
2010-08-18 17:40:10 +00:00
gabor
5ee11c8783 - Revert strlcpy() changes to memcpy() because it's more efficient and
former may be safer but in this case it doesn't add extra
  safety [1]
- Fix -w option [2]
- Fix handling of GREP_OPTIONS [3]
- Fix --line-buffered
- Make stdin input imply --line-buffered so that tail -f can be piped
  to grep [4]
- Imply -h if single file is grepped, this is the GNU behaviour
- Reduce locking overhead to gain some more performance [5]
- Inline some functions to help the compiler better optimize the code
- Use shortcut for empty files [6]

PR:		bin/149425 [6]
Prodded by:	jilles [1]
Reported by:	Alex Kozlov <spam@rm-rf.kiev.ua> [2] [3],
		swell.k@gmail.com [2],
		poyopoyo@puripuri.plala.or.jp [4]
Submitted by:	scf [5],
		Shuichi KITAGUCHI <ki@hh.iij4u.or.jp> [6]
Approved by:	delphij (mentor)
2010-08-15 22:15:04 +00:00
gabor
d3f9c5a1b7 - Use the traditional behaviour for filename and directory name inclusion
and exclusion patterns [1]
- Some improvements on the exiting code, like replacing memcpy with
  strlcpy/strcpy

Approved by:	delphij (mentor)
Pointed out by:	bf [1], des [1]
2010-07-29 00:11:14 +00:00
gabor
6a65a7e10a - Fix --color behaviour to only output color sequences if stdout is a tty
or if forced mode is specified [1]
- While here, add some alternative names for the options and make then
  case-insensitive
- Fix -q and -l behaviour [2]
- Some small changes to make the code easier to review

Submitted by:   swell.k@gmail.com [1],
                dougb [2]
Approved by:    delphij (mentor)
2010-07-25 08:42:18 +00:00
gabor
17349bffe4 Add BSD grep to the base system and make it our default grep.
Deliverables: Small and clean code (1,4 KSLOC vs GNU's 8,5 KSLOC),
              lower memory usage than GNU grep, GNU compatibility,
              BSD license.

TODO:         Performance is somewhat behind GNU grep but it is only
              significant for bigger searches.  The reason is complex, the
              most important factor is that GNU grep uses lots of
              optimizations to improve the speed of the regex library.
              First, we need a modern regex library (practically by adopting
              TRE), add support for GNU-style non-standard regexes and then
              reevalute the performance issues and look for bottlenecks.  In
              the meantime, for those, who need better performance, it is
              possible to build GNU grep by setting WITH_GNU_GREP.

Approved by:            delphij (mentor)
Obtained from:          OpenBSD (http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/),
                        freegrep (http://github.com/howardjp/freegrep)
Sponsored by:           Google SoC 2008
Portbuild tests run by: kris, pav, erwin
Acknowledgements to:    fjoe (as SoC 2008 mentor),
                        everyone who helped in reviewing and testing
2010-07-22 19:11:57 +00:00