63 Commits

Author SHA1 Message Date
Tim J. Robbins
26bdc1d12e Tweak markup of quoted strings and characters: use Dq instead of enclosing
strings in ``obsolete quotes''. Use Li and Ql where appropriate.
2004-07-23 06:06:58 +00:00
Tim J. Robbins
0b651019b4 Add a lengthy discussion of why "tr a-z A-Z" and "tr A-Z a-z" are not the
right way to perform case-conversion.
2004-07-23 05:44:04 +00:00
Tim J. Robbins
c75b843169 Fix description of cmap_lookup_hard(). 2004-07-14 08:36:09 +00:00
Tim J. Robbins
9aed43ae23 Remove unused member of struct csclass: csc_value. 2004-07-14 08:35:11 +00:00
Tim J. Robbins
cfab3bdd89 Splay the left and right subtrees on min - 1 and max + 1, respectively,
before trying to coalesce. Forgetting to splay caused us to miss many
opportunities for coalescing.
2004-07-14 08:33:14 +00:00
Tim J. Robbins
9c8fd487a5 Initialize cs_invert to "false" in new csets. 2004-07-10 06:28:18 +00:00
Tim J. Robbins
9409835314 Report input errors instead of ignoring them. 2004-07-09 05:15:46 +00:00
Tim J. Robbins
e263a4b46e Update for multibyte character support: remove BUGS and change the
description of the -c option to refer to "values" instead of "byte values".
2004-07-09 02:33:46 +00:00
Tim J. Robbins
ca99cfdd14 Add support for multibyte characters. The challenge here was to use
data structures that scale better with large character sets, instead of
arrays indexed by character value:
- Sets of characters to delete/squeeze are stored in a new "cset" structure,
which is implemented as a splay tree of extents. This structure has the
ability to store character classes (ala wctype(3)), but this is not
currently fully utilized.
- Mappings between characters are stored in a new "cmap" structure, which
is also a splay tree.
- The parser no longer builds arrays containing all the characters in a
particular class; instead, next() determines them on-the-fly using
nextwctype(3).
2004-07-09 02:08:07 +00:00
Ruslan Ermilov
6a3e8b0adc Mechanically kill hard sentence breaks. 2004-07-02 22:22:35 +00:00
Tim J. Robbins
6863e5bed2 Document incorrect handling of multibyte characters in input files
and character string arguments.
2004-06-28 07:19:11 +00:00
Andrey A. Chernov
035944c3b6 Back out [:upper:] and [:lower:] classes sorting, it is not required
by POSIX and gains nothing with current code.
2003-08-05 07:59:46 +00:00
Andrey A. Chernov
8ad968ee96 Clarify upper/lower conversion description more. 2003-08-05 07:53:28 +00:00
Andrey A. Chernov
bc44c44a14 Explain better what happens when [:lower:] <-> [:upper:] 2003-08-05 06:00:00 +00:00
Andrey A. Chernov
30c1156451 No functional changes, just code reorganization from prev. commit, it
makes one malloc unneeded, removes two bzero's and makes code more readable.

"Bright ideas comes only _after_ commits."
2003-08-04 05:22:06 +00:00
Andrey A. Chernov
21f53e9138 POSIX require complex processing of 'c-c' ranges: if one of the endpoints
is octal sequence, range is taken in the byte values order, for non-octal
endpoints range is taken in the sorted collation order.

Implement it.
2003-08-04 04:20:04 +00:00
Andrey A. Chernov
796263418b Special fix just for
tr -[cC]s '[:upper:]' '[:lower:]'
case (or vice versa):
chars taken from s2 can be different this time
due to lack of complex upper/lower processing,
so fill string2 again to not miss some.
2003-08-04 02:57:17 +00:00
Andrey A. Chernov
d7da7302f9 Microoptimization of prev. patch: do strdup() only if (cflag || Cflag) 2003-08-03 22:19:43 +00:00
Andrey A. Chernov
e42eb6838e 1) Fix -C - it was broken since introduced, wrong array sorted
2) Fix last (repeated) char after [:class:], it was \0 in original code
2003-08-03 22:02:49 +00:00
Andrey A. Chernov
761c008c99 Remove charcoll() stabilization added in 1.16, it gains nothing but conflicts
with ranges.
2003-08-03 04:18:07 +00:00
Andrey A. Chernov
a508a04d43 POSIX requires 'c-c' must conform collate and be in collation order 2003-08-03 03:51:27 +00:00
Andrey A. Chernov
00611f0457 This patch address two problems.
1st one is relatively minor: according our own manpage, upper and lower
classes must be sorted, but currently not.

2nd one is serious:
	tr '[:lower:]' '[:upper:]'
	(and vice versa) currently works only if upper and lower classes
	have exact the same number of elements. When it is not true, like for
	many ISO8859-x locales which have bigger amount of lowercase letters,
	tr may do nasty things.

	See this page
	http://www.opengroup.org/onlinepubs/007908799/xcu/tr.html
	for detailed description of desired tr behaviour in such cases.
2003-08-03 02:23:39 +00:00
Jens Schweikhardt
d64ada501a Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00
Ruslan Ermilov
06e482e60a mdoc(7) police: markup polishing.
Approved by:	re
2002-11-26 17:33:37 +00:00
Philippe Charnier
b9a86ec995 Use .Fl/Ar for flags and arguments. 2002-10-17 13:04:49 +00:00
David Malone
f4ac32def2 ANSIify function definitions.
Add some constness to avoid some warnings.
Remove use register keyword.
Deal with missing/unneeded extern/prototypes.
Some minor type changes/casts to avoid warnings.

Reviewed by:	md5
2002-09-04 23:29:10 +00:00
Tim J. Robbins
6e9c52b638 When translating and -C is specified, behave as if the complemented set was
in the locale collating order as required by SUSv3.
2002-07-29 23:42:00 +00:00
Tim J. Robbins
482711cfa6 When translating and the -c option is specified, handle the case where the
second string argument is more than one character in length in the way
required by SUSv3 (and the way GNU textutils and SVR4 do it).
2002-07-29 14:50:54 +00:00
Tim J. Robbins
7dd4ac68f1 Use err instead of errx when malloc fails. "malloc" is not a helpful
error message.
2002-07-05 09:28:13 +00:00
Tim J. Robbins
232a0ff51d Improve parsing of character and equivalence classes:
[:*] and [=*] are parsed as `infinitely many repetitions of :' (or *)
instead of literal characters (SUSv3)
2002-06-15 07:38:27 +00:00
Tim J. Robbins
dc20d4b9d4 Move the #include and #define's to the top of the file. 2002-06-14 15:56:52 +00:00
Tim J. Robbins
4efc23dabf Bump the size of the equivalence set to NCHARS; this file was left out
of a previous commit implementing equivalence classes.
2002-06-14 15:53:38 +00:00
Tim J. Robbins
6eb0710e98 Sort sections. Avoid using "The -? option" at the start of option descriptions. 2002-06-14 10:11:41 +00:00
Tim J. Robbins
e73c3d279c Don't treat the trailing ']' of an equivalence class expression as a
character in the set. tr -d '[=a=]' was deleting ]'s as well as a's.
Noticed by the textutils test suite.
2002-06-14 09:53:11 +00:00
Tim J. Robbins
dfac4f3695 Add the P1003.1-2001 -C option which complements the set of characters
(not byte values) specified by the first string argument.
2002-06-14 08:58:30 +00:00
Tim J. Robbins
85f6c317ea Implement support for equivalence classes ([=e=]) when the mapping is
one-to-one (SUSv3)
2002-06-14 07:37:08 +00:00
Warner Losh
3f330d7d1a remove __P 2002-03-22 01:42:45 +00:00
Alfred Perlstein
40e8dd712c properly handle zero length first string when doing -c
PR: 34663
MFC After: 3 days
2002-03-02 10:36:37 +00:00
Mark Murray
787324755c WARNS=2 fixes, use __FBSDID(), kill register keyword. 2001-12-11 23:36:25 +00:00
Ruslan Ermilov
d628d776c4 mdoc(7) police: utilize the new .Ex macro. 2001-08-15 09:09:47 +00:00
Ruslan Ermilov
753d686d34 mdoc(7) police: s/BSD/.Bx/ where appropriate. 2001-08-14 10:01:54 +00:00
Dima Dorfman
f247324df7 Remove whitespace at EOL. 2001-07-15 08:06:20 +00:00
Ruslan Ermilov
9597e1c260 mdoc(7) police: -column lists require column width specifiers. 2001-07-06 10:07:43 +00:00
Ruslan Ermilov
d0353b836e mdoc(7) police: split punctuation characters + misc fixes. 2001-02-01 16:38:02 +00:00
Ruslan Ermilov
9b88faecd3 Prepare for mdoc(7)NG. 2000-12-19 16:00:12 +00:00
Ruslan Ermilov
8fe908ef0c mdoc(7) police: use the new features of the Nm macro. 2000-11-20 19:21:22 +00:00
Ruslan Ermilov
726b61ab5f Avoid use of direct troff requests in mdoc(7) manual pages. 2000-11-10 17:46:15 +00:00
Philippe Charnier
dbb9d8f826 Add DIAGNOSTICS section name 2000-03-26 15:06:46 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Nik Clayton
3be5f1f5ce Add $Id$, to make it simpler for members of the translation teams to
track.

The $Id$ line is normally at the bottom of the main comment block in the
man page, separated from the rest of the manpage by an empty comment,
like so;

     .\"    $Id$
     .\"

If the immediately preceding comment is a @(#) format ID marker than the
the $Id$ will line up underneath it with no intervening blank lines.
Otherwise, an additional blank line is inserted.

Approved by:            bde
1999-07-12 20:24:20 +00:00