July 27, 2021:
As per IEEE Std 1003.1-2008, -F "str" is now consistent with
-v FS="str" when str is null. Thanks to Warner Losh.
July 24, 2021:
Fix readrec's definition of a record. This fixes an issue
with NetBSD's RS regular expression support that can cause
an infinite read loop. Thanks to Miguel Pineiro Jr.
Fix regular expression RS ^-anchoring. RS ^-anchoring needs to
know if it is reading the first record of a file. This change
restores a missing line that was overlooked when porting NetBSD's
RS regex functionality. Thanks to Miguel Pineiro Jr.
Fix size computation in replace_repeat() for special case
REPEAT_WITH_Q. Thanks to Todd C. Miller.
Also, included the tests from upstream, though they aren't yet connected
to the tree.
Sponsored by: Netflix
We normally don't add $FreeBSD$ to contrib software. However, these
changes date back to the CVS era of source code management and have been
overlooked. Now that all these files are back to the same as the
upstream bsd-features branch, remove the FreeBSD specific changes, which
are now just $FreeBSD$ and the (FreeBSD) in the version string.
MFC After: 2 weeks
Sponsored by: Netflix
In 2005, FreeBSD changed one-true-awk to honor the locale's collating
order. This was billed as a temporary patch. It was also compatible with
the then-current behavior of gawk. That temporary patch has lasted 16
years now.
However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regular
expressions outside of the "C" and "POSIX" locales to be undefined.
Starting in 2011, gawk 4.0 stopped using the locale for the range
regular expressions and used the traditional behavior only. The
maintainer had grown weary of answering why '[A-Z]' would sometimes
match lower-case expressions. The details about are explained here:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
To restore compatibility with other implementaitons of awk, revert this
patch. FreeBSD is the odd-system out. It also has the nice side effect
of eliminating the last of our differences with upstream one-true-awk.
Reviewed by: cy, rgrimes
MFC After: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31114
In the merge of 20210215, I left two merge conflicts #if 0'd by mistake
to check later rather than resolve them as part of the merge. This code
turns out to be from the original one-true-awk import and not FreeBSD
specific, so remove them.
Remove a extra definition of HAT.
Remove a stylistic change that also appears to be a mismerge along the
way.
Remove FREEBSD-upgrade. Nobody has updated it since the original 2007
cvs import. It talks about old CVS branches that never made it into svn,
let alone git. New imports will follow the standard practices now, so
there's nothing left to document.
Move README to README.md and copy the README.md from upstream over.
This leaves just the $FreeBSD$ lines (which remain for the stable/12
merge) and the strcoll part of ru@'s r201989/d98dd8e5f94c as the only
diffs with upstream. FreeBSD also still has its own man page, which I
don't plan on changing. Once this commit is merged to stable/12, I plan
no further merges to stable/12. Sometime after that I'll remove the
$FreeBSD$ lines to reduce the diffs even more (though i want to make
sure plans won't change first). I also plan to talk to upstream about
this change...
MFC After: 2 weeks
Sponsored by: Netflix
Import the latest bsd-features branch of the one-true-awk upstream:
o Move to bison for $YACC
o Set close-on-exec flag for file and pipe redirects that aren't std*
o lots of little fixes to modernize ocde base
o free sval member before setting it
o fix a bug where a{0,3} could match aaaa
o pull in systime and strftime from NetBSD awk
o pull in fixes from {Net,Free,Open}BSD (normalized our code with them)
o add BSD extensions and, or, xor, compl, lsheift, rshift (mostly a nop)
Also revert a few of the trivial FreeBSD changes that were done slightly
differently in the upstreaming process. Also, our PR database may have
been mined by upstream for these fixes, and Mikolaj Golub may deserve
credit for some of the fixes in this update.
Suggested by: Mikolaj Golub <to.my.trociny@gmail.com>
PR: 143363,143365,143368,143369,143373,143375,214782
Sponsored by: Netflix
When matching a regex with ^, it would attempt to access
gototab[NSTATES][NCHARS+2], and therefore access the state for the \002
character instead. This change is required to run awk under CHERI (with
sub-object bounds) and when running with UBSan instrumentation.
This was committed upstream as cbf924342b
Found by: CHERI (with subobject bounds enabled)
Obtained from: CheriBSD
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D26509
Note: this backs out a number of changes we've made to awk because
they aren't upstream, but are on the vendor branch. Those will be
reapplied. svn makes it needlessly difficult to know which ones, but
at least r315426, r301289, and maybe r301691, though there may be
others too. None of these are critical, so bisecting through this
point is safe for all but awk regression tests :).
$ echo x | awk '/[[:cntrl:]]/'
x
The NUL character in cntrl class truncates the pattern, and an empty
pattern matches anything. The patch skips NUL as a quick fix.
PR: 195792
Submitted by: kdrakehp@zoho.com
Approved by: bwk@cs.princeton.edu (the author)
MFC after: 3 days
Instead of changing the whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and focus on fixing bugs in what we have now,
too many small obstacles we have choicing other way, counting ports.
Corresponding libc changes are backed out in r302824.
I'll try to keep the change very minimal to not touch contribed code much.
I'll send it upstream when it will be merged to main branches,
but we need the change right now here.
- Make one-true-awk respect locale's collating order in [a-z]
bracket expressions, until a more complete fix (like handing
BREs) is ready.
- Don't require a space between -[fv] and its argument.