load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter. Also
adds support for per-thread locales. This work was funded by the FreeBSD
Foundation.
Please test any code you have that uses the C standard locale functions!
Reviewed by: das (gdtoa changes)
Approved by: dim (mentor)
An additional one coming from http://www.research.att.com/~gsf/testregex/
was not added; at some point the entire AT&T regression test harness
should be imported here.
But that would also mean commitment to fix the uncovered errors.
PR: 130504
Submitted by: Chris Kuklewicz
that belong in a character class, and (2) one for matching all
the characters *not* in a character class.
Submitted by: Mark B, mkbucc at gmail.com
MFC after: 3 days
inadvertently match a negative char in the RE being compiled.
This fixes compilation of "\376" (as an ERE) and "\376\376" (as a BRE).
PR: 84740
MFC after: 1 week
reading past 'stop' in various places when converting multibyte characters.
Reading too far caused truncation to not be detected when it should have
been, eventually causing regexec() to loop infinitely in with certain
combinations of patterns and strings in multibyte locales.
PR: 74020
MFC after: 4 weeks
multibyte character support:
- In CHadd(), avoid writing past the end of the character set bitmap when
the opposite-case counterpart of wide characters with values less than
NC have values greater than or equal to NC.
- In CHaddtype(), fix a braino that caused alphabetic characters to be
added to all character classes! (but only with REG_ICASE)
PR: 71367
idea is that we perform multibyte->wide character conversion while parsing
and compiling, then convert byte sequences to wide characters when they're
needed for comparison and stepping through the string during execution.
As with tr(1), the main complication is to efficiently represent sets of
characters in bracket expressions. The old bitmap representation is replaced
by a bitmap for the first 256 characters combined with a vector of individual
wide characters, a vector of character ranges (for [A-Z] etc.), and a vector
of character classes (for [[:alpha:]] etc.).
One other point of interest is that although the Boyer-Moore algorithm had
to be disabled in the general multibyte case, it is still enabled for UTF-8
because of its self-synchronizing nature. This greatly speeds up matching
by reducing the number of multibyte conversions that need to be done.
Only warnings that could be fixed without changing the generated object
code and without restructuring the source code have been handled.
Reviewed by: /sbin/md5