Commit Graph

434 Commits

Author SHA1 Message Date
tjr
0bea5c0108 Add fast paths for conversion of plain ASCII characters. 2004-07-09 15:46:06 +00:00
tjr
3a9d81b253 Add a function to iterate over all characters in a particular character
class. This is necessary in order to implement tr(1) efficiently in
multibyte locales, since the brute force method of finding all characters
in a class is infeasible with a 32-bit (or wider) wchar_t.
2004-07-08 06:43:37 +00:00
ru
b5e1c67f19 Markup nits. 2004-07-05 06:39:03 +00:00
ru
6651f20e0d Sort SEE ALSO references (in dictionary order, ignoring case). 2004-07-04 20:55:50 +00:00
ru
01548ace15 Mechanically kill hard sentence breaks. 2004-07-02 23:52:20 +00:00
ru
4b39413aeb Removed trailing whitespace. 2004-07-02 19:07:33 +00:00
ru
95168a499a Markup, grammar, and spelling fixes. 2004-06-30 20:09:10 +00:00
ru
6ad65dd7e0 Fixed a typo. 2004-06-30 19:32:41 +00:00
tjr
d04fd4700f Prefix the names of members of _RuneLocale and its sub-structures
with ``__'' to avoid polluting the namespace. This doesn't change the
documented rune interface at all, but breaks applications that accessed
_RuneLocale directly.
2004-06-23 07:01:44 +00:00
mpp
98d43ce6f1 Spelling fixes. 2004-06-21 19:54:56 +00:00
tjr
fb60260f98 Buffer partial wide characters more efficiently: instead of storing the
multibyte representation in conversion state objects, store the
accumulated wide character, set number and number of bytes remaining
to avoid having to derive them every time mbrtowc() is called.
2004-05-27 10:54:34 +00:00
tjr
0efcf2d09b Scan the source string for invalid wide characters in wcsrtombs()
in the dst == NULL case.
2004-05-25 10:45:24 +00:00
tjr
ea28a65744 Grab all the information we need about a character with one call to
__maskrune() instead of one direct call and one through iswprint().
2004-05-23 13:20:09 +00:00
tjr
aee8349a0c Use conversion state objects to store the accumulated wide character,
low bound, and the number of bytes remaining instead of storing the
raw byte sequence and deriving them every time mbrtowc() is called.
This is much faster -- about twice as fast in some crude benchmarks.
2004-05-17 12:32:40 +00:00
tjr
b40b6a2d84 Use a simpler and faster buffering scheme for partial multibyte characters. 2004-05-17 11:16:14 +00:00
tjr
9e176d6b08 Use a simpler, faster buffering scheme for partial characters in mbrtowc(). 2004-05-14 15:40:47 +00:00
tjr
3aa9288a48 Allow encoding modules to override the default implementations of
mbsrtowcs() and wcsrtombs(). Provide a fast implementation for the
trivial "NONE" encoding.
2004-05-13 11:20:27 +00:00
tjr
e442306798 Fix braino in previous: check that the second byte in the character
buffer is non-null when the character is two bytes long, not when
the buffer is two bytes long.
2004-05-13 03:08:28 +00:00
tjr
5ad27cd64f Reduce overhead by calling internal versions of the multibyte conversion
functions directly wherever possible.
2004-05-12 14:26:54 +00:00
tjr
e3f042f4af Move prototypes of various encoding-related functions into a new header
file to avoid extern'ing them all over the place.
2004-05-12 14:09:04 +00:00
tjr
d79e71957e In the absence of proper validation, at least check that null bytes
do not appear as anything but the first byte of a multibyte character.
2004-05-11 14:08:22 +00:00
tjr
a8117b04ca Use a binary search to find the range containing a character in
RuneRange arrays. This is much faster when there are hundreds of
ranges (as is the case in UTF-8 locales) and was inspired by a
similar change made by Apple in Darwin.
2004-05-09 13:04:49 +00:00
ache
a7c84134a6 Rewrite split_lines() to operate safely
PR:             62694
Submitted by:   moulin p <moulin.p@calyopea.com>
2004-04-25 19:56:50 +00:00
tjr
8f8a2ad179 Perform some basic validation of multibyte conversion state objects. 2004-04-12 13:09:18 +00:00
tjr
e26da574d5 Remove a nonsensical remark about byte order markers in UTF-8 streams. 2004-04-12 12:58:41 +00:00
tjr
e768e0d54f Document the meaning of the zero return value. 2004-04-11 05:19:19 +00:00
davidxu
289170412b Fix a typo. I was locked out for two days from my machine. 2004-04-10 14:36:57 +00:00
tjr
17077e5ae6 Don't cast away const qualifiers.
Spotted by:	bde
2004-04-10 00:27:52 +00:00
tjr
0ca2900d48 Update manual pages for change to C99 mbrtowc() semantics. 2004-04-08 09:59:02 +00:00
tjr
54a18fa1d6 Allow partial multibyte characters to accumulate in conversion state
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.
2004-04-07 10:48:19 +00:00
tjr
226e976dd7 Begin conversions for sgetrune() and sputrune() in the initial
conversion state.
2004-04-07 09:49:10 +00:00
tjr
47b6d3f343 Prepare to handle state-dependent encodings. This mainly involves not
taking shortcuts when it comes to storing and passing around conversion
states.
2004-04-07 09:47:56 +00:00
tjr
c3bbcd6ef6 Begin in the initial shift state in mbstowcs() and wcstombs().
(This change is non-functional since nothing uses states yet.)
2004-04-07 08:33:23 +00:00
tjr
4212f9711f Prepare to handle state-dependent encodings. This mainly involves not
taking shortcuts when it comes to storing and passing around conversion
states.
2004-04-06 13:14:03 +00:00
tjr
5642b0bbf4 Remove support for emulating mbrtowc() and wcrtomb() in terms of the
old rune interface now that it is no longer needed.
2004-04-04 11:31:29 +00:00
tjr
d98eda520c Reimplement the GB18030 encoding method using the new-style (mbrtowc()/
wcrtomb()) interface.
2004-04-04 11:00:42 +00:00
tjr
dd0763020d Reimplement the deprecated UTF2 encoding method using the UTF-8 code
as a base. mbrtowc() and wcrtomb() are now implemented directly
instead of being emulatedi with sgetrune() and sputrune().
2004-04-04 10:49:45 +00:00
tjr
1a32baef3b Add cross-references to isideogram(3), isphonogram(3), isrune(3),
isspecial(3) and wctype(3).
2004-03-30 08:11:57 +00:00
tjr
0e393613f2 Add basic manual pages for isideogram(), isphonogram(), isrune()
and isspecial().
2004-03-30 07:23:54 +00:00
tjr
94fd478493 Trim cross-references. 2004-03-30 07:19:35 +00:00
tjr
9d08058845 Document the isnumber() and ishexnumber() functions, and explain how they
differ (at least in theory) from isdigit() and isxdigit().
2004-03-30 07:02:04 +00:00
tjr
8aa9f6fafe Remove duplicate MLINK. 2004-03-29 21:46:52 +00:00
tjr
9b0a927d71 Recognize the "rune" character class in wctype(). 2004-03-27 08:59:21 +00:00
dds
02e13ba2ce Make consistent with the better written wcsrtombs function:
- Fix syntax
- Remove the (slightly wrong) duplicate explanation of the error condition
- Change reference to invalid multibyte character into invalid wide character
2004-02-27 15:03:22 +00:00
ache
d0d398da1c LC_ALL not always take priority over other LC_*
Obtained from:  NetBSD
PR:             62047
2004-01-31 19:15:32 +00:00
ache
ab3e3b27cb Add reference to environ(7) 2004-01-29 09:27:24 +00:00
nectar
c281d0e2ea Remove unused variables and function declarations. Add missing headers. 2004-01-06 18:26:15 +00:00
ache
b89dd89c31 Properly advance "x/y/z" form slash-pointers in some rare cases
PR:             60539
2003-12-24 10:16:46 +00:00
ache
750b0b565d First byte of GBK-like sequences is 0x81, not 0x80 2003-12-19 12:54:42 +00:00
tjr
f9c332bd6a Set __mbrtowc and __wcrtomb correctly when changing to the C/POSIX locale.
Save __mbrtowc and __wcrtomb and restore them when changing back to
the cached locale.

Reported by:	perky
2003-12-08 23:52:22 +00:00