Commit Graph

24 Commits

Author SHA1 Message Date
jkh
05757f7ee0 tr(1) attempts to convert \n[n][n] sequences into octal digits, but doesn't
check to see that a given digit is actually an octal digit.  This leads to
unusual consequences if passed in values like \9.

Reported by:	Joseph Davison (OpenDarwin project)
MFC after:	1 week
2004-11-14 05:15:25 +00:00
tjr
d291df1e3f Add support for multibyte characters. The challenge here was to use
data structures that scale better with large character sets, instead of
arrays indexed by character value:
- Sets of characters to delete/squeeze are stored in a new "cset" structure,
which is implemented as a splay tree of extents. This structure has the
ability to store character classes (ala wctype(3)), but this is not
currently fully utilized.
- Mappings between characters are stored in a new "cmap" structure, which
is also a splay tree.
- The parser no longer builds arrays containing all the characters in a
particular class; instead, next() determines them on-the-fly using
nextwctype(3).
2004-07-09 02:08:07 +00:00
ache
e771f43084 Back out [:upper:] and [:lower:] classes sorting, it is not required
by POSIX and gains nothing with current code.
2003-08-05 07:59:46 +00:00
ache
71cdd2eb3d No functional changes, just code reorganization from prev. commit, it
makes one malloc unneeded, removes two bzero's and makes code more readable.

"Bright ideas comes only _after_ commits."
2003-08-04 05:22:06 +00:00
ache
fbdcc3c060 POSIX require complex processing of 'c-c' ranges: if one of the endpoints
is octal sequence, range is taken in the byte values order, for non-octal
endpoints range is taken in the sorted collation order.

Implement it.
2003-08-04 04:20:04 +00:00
ache
a6e8918154 1) Fix -C - it was broken since introduced, wrong array sorted
2) Fix last (repeated) char after [:class:], it was \0 in original code
2003-08-03 22:02:49 +00:00
ache
b97366a236 POSIX requires 'c-c' must conform collate and be in collation order 2003-08-03 03:51:27 +00:00
ache
0113a19ead This patch address two problems.
1st one is relatively minor: according our own manpage, upper and lower
classes must be sorted, but currently not.

2nd one is serious:
	tr '[:lower:]' '[:upper:]'
	(and vice versa) currently works only if upper and lower classes
	have exact the same number of elements. When it is not true, like for
	many ISO8859-x locales which have bigger amount of lowercase letters,
	tr may do nasty things.

	See this page
	http://www.opengroup.org/onlinepubs/007908799/xcu/tr.html
	for detailed description of desired tr behaviour in such cases.
2003-08-03 02:23:39 +00:00
tjr
3594350f00 Use err instead of errx when malloc fails. "malloc" is not a helpful
error message.
2002-07-05 09:28:13 +00:00
tjr
361d0dd8a7 Improve parsing of character and equivalence classes:
[:*] and [=*] are parsed as `infinitely many repetitions of :' (or *)
instead of literal characters (SUSv3)
2002-06-15 07:38:27 +00:00
tjr
5a8b5dcfa4 Don't treat the trailing ']' of an equivalence class expression as a
character in the set. tr -d '[=a=]' was deleting ]'s as well as a's.
Noticed by the textutils test suite.
2002-06-14 09:53:11 +00:00
tjr
0c8a9db6f9 Implement support for equivalence classes ([=e=]) when the mapping is
one-to-one (SUSv3)
2002-06-14 07:37:08 +00:00
imp
0b20191705 remove __P 2002-03-22 01:42:45 +00:00
markm
eddee66a58 WARNS=2 fixes, use __FBSDID(), kill register keyword. 2001-12-11 23:36:25 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
charnier
6473d1562f Use err(3) instead of local redefinition. Cosmetic in usage(). 1997-08-18 07:24:58 +00:00
joerg
46c1f410f7 Cast char's to (u_char) before passing them to isctype() functions. 1996-03-19 21:21:06 +00:00
joerg
e34654d225 Fix a couple of sign-extension bugs.
Submitted by:	serg@bcs1.bcs.zaporizhzhe.ua (Sergey Shkonda)
1996-03-17 09:00:48 +00:00
bde
4fc868ceb3 Updated to BSD4.4lite2. Fixes PR836. `echo abcd | tr a-d A-BC-D' now
works.
1995-11-28 13:18:47 +00:00
ache
7c02939670 Fix broken charclass handling
Add setlocale LC_CTYPE
1995-10-28 22:27:03 +00:00
phk
ec905d38f8 Remove declamations which <ctype.h> already does for us. 1995-10-21 22:02:10 +00:00
phk
64993ba21b Added #include <ctype.h> 1995-10-21 21:08:43 +00:00
ache
17e2f636ce Fix print class mistype 1994-10-28 23:31:48 +00:00
rgrimes
f9ab90d9d6 BSD 4.4 Lite Usr.bin Sources 1994-05-27 12:33:43 +00:00