Instead of changing the whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and focus on fixing bugs in what we have now,
too many small obstacles we have choicing other way, counting ports.
Corresponding libc changes are backed out in r302824.
remove collation support for a-z ranges here too.
It was implemented for single byte locales only in any case.
2) Reduce [Cc]flag loop to WCHAR_MAX, WINT_MAX here includes WEOF which is
not a character.
3) Optimize [Cc]flag case: don't repeatedly add the last character of
string2 to squeeze cset when string2 reach its EOS state.
4) Reflect in the manpage that [=equiv=] is implemented for single
byte locales only.
The index() and rindex() functions were marked LEGACY in the 2001
revision of POSIX and were subsequently removed from the 2008 revision.
The strchr() and strrchr() functions are part of the C standard.
This makes the source code a lot more consistent, as most of these C
files also call into other str*() routines. In fact, about a dozen
already perform strchr() calls.
is in accordance with the information provided at
ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change
Also add $FreeBSD$ to a few files to keep svn happy.
Discussed with: imp, rwatson
A closing bracket immediately after '[=' should not be treated as special.
Different from the submitted patch, a string ending with '[=' does not cause
access beyond the terminating '\0'.
PR: bin/150384
Submitted by: Richard Lowe
MFC after: 2 weeks
check to see that a given digit is actually an octal digit. This leads to
unusual consequences if passed in values like \9.
Reported by: Joseph Davison (OpenDarwin project)
MFC after: 1 week
data structures that scale better with large character sets, instead of
arrays indexed by character value:
- Sets of characters to delete/squeeze are stored in a new "cset" structure,
which is implemented as a splay tree of extents. This structure has the
ability to store character classes (ala wctype(3)), but this is not
currently fully utilized.
- Mappings between characters are stored in a new "cmap" structure, which
is also a splay tree.
- The parser no longer builds arrays containing all the characters in a
particular class; instead, next() determines them on-the-fly using
nextwctype(3).
1st one is relatively minor: according our own manpage, upper and lower
classes must be sorted, but currently not.
2nd one is serious:
tr '[:lower:]' '[:upper:]'
(and vice versa) currently works only if upper and lower classes
have exact the same number of elements. When it is not true, like for
many ISO8859-x locales which have bigger amount of lowercase letters,
tr may do nasty things.
See this page
http://www.opengroup.org/onlinepubs/007908799/xcu/tr.html
for detailed description of desired tr behaviour in such cases.