Commit Graph

110 Commits

Author SHA1 Message Date
tjr
ca547df2d3 Remove unused member of struct csclass: csc_value. 2004-07-14 08:35:11 +00:00
tjr
953a00fce4 Splay the left and right subtrees on min - 1 and max + 1, respectively,
before trying to coalesce. Forgetting to splay caused us to miss many
opportunities for coalescing.
2004-07-14 08:33:14 +00:00
tjr
3f97d8af9c Initialize cs_invert to "false" in new csets. 2004-07-10 06:28:18 +00:00
tjr
3e5d71bd1a Report input errors instead of ignoring them. 2004-07-09 05:15:46 +00:00
tjr
d7311fa844 Update for multibyte character support: remove BUGS and change the
description of the -c option to refer to "values" instead of "byte values".
2004-07-09 02:33:46 +00:00
tjr
d291df1e3f Add support for multibyte characters. The challenge here was to use
data structures that scale better with large character sets, instead of
arrays indexed by character value:
- Sets of characters to delete/squeeze are stored in a new "cset" structure,
which is implemented as a splay tree of extents. This structure has the
ability to store character classes (ala wctype(3)), but this is not
currently fully utilized.
- Mappings between characters are stored in a new "cmap" structure, which
is also a splay tree.
- The parser no longer builds arrays containing all the characters in a
particular class; instead, next() determines them on-the-fly using
nextwctype(3).
2004-07-09 02:08:07 +00:00
ru
fb1d8b3724 Mechanically kill hard sentence breaks. 2004-07-02 22:22:35 +00:00
tjr
2c569caec5 Document incorrect handling of multibyte characters in input files
and character string arguments.
2004-06-28 07:19:11 +00:00
ache
e771f43084 Back out [:upper:] and [:lower:] classes sorting, it is not required
by POSIX and gains nothing with current code.
2003-08-05 07:59:46 +00:00
ache
1bdac03b07 Clarify upper/lower conversion description more. 2003-08-05 07:53:28 +00:00
ache
3a19fb4eb0 Explain better what happens when [:lower:] <-> [:upper:] 2003-08-05 06:00:00 +00:00
ache
71cdd2eb3d No functional changes, just code reorganization from prev. commit, it
makes one malloc unneeded, removes two bzero's and makes code more readable.

"Bright ideas comes only _after_ commits."
2003-08-04 05:22:06 +00:00
ache
fbdcc3c060 POSIX require complex processing of 'c-c' ranges: if one of the endpoints
is octal sequence, range is taken in the byte values order, for non-octal
endpoints range is taken in the sorted collation order.

Implement it.
2003-08-04 04:20:04 +00:00
ache
e98015074a Special fix just for
tr -[cC]s '[:upper:]' '[:lower:]'
case (or vice versa):
chars taken from s2 can be different this time
due to lack of complex upper/lower processing,
so fill string2 again to not miss some.
2003-08-04 02:57:17 +00:00
ache
a506e844d1 Microoptimization of prev. patch: do strdup() only if (cflag || Cflag) 2003-08-03 22:19:43 +00:00
ache
a6e8918154 1) Fix -C - it was broken since introduced, wrong array sorted
2) Fix last (repeated) char after [:class:], it was \0 in original code
2003-08-03 22:02:49 +00:00
ache
80172fba76 Remove charcoll() stabilization added in 1.16, it gains nothing but conflicts
with ranges.
2003-08-03 04:18:07 +00:00
ache
b97366a236 POSIX requires 'c-c' must conform collate and be in collation order 2003-08-03 03:51:27 +00:00
ache
0113a19ead This patch address two problems.
1st one is relatively minor: according our own manpage, upper and lower
classes must be sorted, but currently not.

2nd one is serious:
	tr '[:lower:]' '[:upper:]'
	(and vice versa) currently works only if upper and lower classes
	have exact the same number of elements. When it is not true, like for
	many ISO8859-x locales which have bigger amount of lowercase letters,
	tr may do nasty things.

	See this page
	http://www.opengroup.org/onlinepubs/007908799/xcu/tr.html
	for detailed description of desired tr behaviour in such cases.
2003-08-03 02:23:39 +00:00
schweikh
86f7487fb6 Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00
ru
b67068895d mdoc(7) police: markup polishing.
Approved by:	re
2002-11-26 17:33:37 +00:00
charnier
dcefc83b9c Use .Fl/Ar for flags and arguments. 2002-10-17 13:04:49 +00:00
dwmalone
b4339b74ad ANSIify function definitions.
Add some constness to avoid some warnings.
Remove use register keyword.
Deal with missing/unneeded extern/prototypes.
Some minor type changes/casts to avoid warnings.

Reviewed by:	md5
2002-09-04 23:29:10 +00:00
tjr
6d20181e60 When translating and -C is specified, behave as if the complemented set was
in the locale collating order as required by SUSv3.
2002-07-29 23:42:00 +00:00
tjr
fb4f03d256 When translating and the -c option is specified, handle the case where the
second string argument is more than one character in length in the way
required by SUSv3 (and the way GNU textutils and SVR4 do it).
2002-07-29 14:50:54 +00:00
tjr
3594350f00 Use err instead of errx when malloc fails. "malloc" is not a helpful
error message.
2002-07-05 09:28:13 +00:00
tjr
361d0dd8a7 Improve parsing of character and equivalence classes:
[:*] and [=*] are parsed as `infinitely many repetitions of :' (or *)
instead of literal characters (SUSv3)
2002-06-15 07:38:27 +00:00
tjr
1fa61e7038 Move the #include and #define's to the top of the file. 2002-06-14 15:56:52 +00:00
tjr
38575dbfdf Bump the size of the equivalence set to NCHARS; this file was left out
of a previous commit implementing equivalence classes.
2002-06-14 15:53:38 +00:00
tjr
f8e6c7f292 Sort sections. Avoid using "The -? option" at the start of option descriptions. 2002-06-14 10:11:41 +00:00
tjr
5a8b5dcfa4 Don't treat the trailing ']' of an equivalence class expression as a
character in the set. tr -d '[=a=]' was deleting ]'s as well as a's.
Noticed by the textutils test suite.
2002-06-14 09:53:11 +00:00
tjr
29924b60f3 Add the P1003.1-2001 -C option which complements the set of characters
(not byte values) specified by the first string argument.
2002-06-14 08:58:30 +00:00
tjr
0c8a9db6f9 Implement support for equivalence classes ([=e=]) when the mapping is
one-to-one (SUSv3)
2002-06-14 07:37:08 +00:00
imp
0b20191705 remove __P 2002-03-22 01:42:45 +00:00
alfred
4461fe4699 properly handle zero length first string when doing -c
PR: 34663
MFC After: 3 days
2002-03-02 10:36:37 +00:00
markm
eddee66a58 WARNS=2 fixes, use __FBSDID(), kill register keyword. 2001-12-11 23:36:25 +00:00
ru
bde8ec1b70 mdoc(7) police: utilize the new .Ex macro. 2001-08-15 09:09:47 +00:00
ru
24c7b0a61d mdoc(7) police: s/BSD/.Bx/ where appropriate. 2001-08-14 10:01:54 +00:00
dd
911ca14c87 Remove whitespace at EOL. 2001-07-15 08:06:20 +00:00
ru
83c24735dc mdoc(7) police: -column lists require column width specifiers. 2001-07-06 10:07:43 +00:00
ru
8a6f8b5fe4 mdoc(7) police: split punctuation characters + misc fixes. 2001-02-01 16:38:02 +00:00
ru
e6cfc0711d Prepare for mdoc(7)NG. 2000-12-19 16:00:12 +00:00
ru
0d1334ca0c mdoc(7) police: use the new features of the Nm macro. 2000-11-20 19:21:22 +00:00
ru
a6f5d950d8 Avoid use of direct troff requests in mdoc(7) manual pages. 2000-11-10 17:46:15 +00:00
charnier
42d5955dc9 Add DIAGNOSTICS section name 2000-03-26 15:06:46 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
nik
6578739ddb Add $Id$, to make it simpler for members of the translation teams to
track.

The $Id$ line is normally at the bottom of the main comment block in the
man page, separated from the rest of the manpage by an empty comment,
like so;

     .\"    $Id$
     .\"

If the immediately preceding comment is a @(#) format ID marker than the
the $Id$ will line up underneath it with no intervening blank lines.
Otherwise, an additional blank line is inserted.

Approved by:            bde
1999-07-12 20:24:20 +00:00
helbig
ba81afcca2 Submitted by: Joachim Kuebart, thanks.
Add -u option to force unbuffered output
1997-10-12 09:52:49 +00:00
charnier
6473d1562f Use err(3) instead of local redefinition. Cosmetic in usage(). 1997-08-18 07:24:58 +00:00
imp
141381e1cb compare return value from getopt against -1 rather than EOF, per the final
posix standard on the topic.
1997-03-29 04:34:07 +00:00
peter
deba7db48c Merge from Lite2 1997-03-11 13:43:33 +00:00
joerg
46c1f410f7 Cast char's to (u_char) before passing them to isctype() functions. 1996-03-19 21:21:06 +00:00
joerg
e34654d225 Fix a couple of sign-extension bugs.
Submitted by:	serg@bcs1.bcs.zaporizhzhe.ua (Sergey Shkonda)
1996-03-17 09:00:48 +00:00
bde
4fc868ceb3 Updated to BSD4.4lite2. Fixes PR836. `echo abcd | tr a-d A-BC-D' now
works.
1995-11-28 13:18:47 +00:00
ache
7c02939670 Fix broken charclass handling
Add setlocale LC_CTYPE
1995-10-28 22:27:03 +00:00
phk
ec905d38f8 Remove declamations which <ctype.h> already does for us. 1995-10-21 22:02:10 +00:00
phk
64993ba21b Added #include <ctype.h> 1995-10-21 21:08:43 +00:00
rgrimes
a14d555c87 Remove trailing whitespace. 1995-05-30 06:41:30 +00:00
ache
17e2f636ce Fix print class mistype 1994-10-28 23:31:48 +00:00
rgrimes
f9ab90d9d6 BSD 4.4 Lite Usr.bin Sources 1994-05-27 12:33:43 +00:00