Commit Graph

11 Commits

Author SHA1 Message Date
tjr
b9fa8ef024 Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs().
These convert plain ASCII characters in-line, making them only slightly
slower than the single-byte ("NONE" encoding) version when processing
ASCII strings.
2004-07-27 06:29:48 +00:00
tjr
0bea5c0108 Add fast paths for conversion of plain ASCII characters. 2004-07-09 15:46:06 +00:00
tjr
aee8349a0c Use conversion state objects to store the accumulated wide character,
low bound, and the number of bytes remaining instead of storing the
raw byte sequence and deriving them every time mbrtowc() is called.
This is much faster -- about twice as fast in some crude benchmarks.
2004-05-17 12:32:40 +00:00
tjr
e3f042f4af Move prototypes of various encoding-related functions into a new header
file to avoid extern'ing them all over the place.
2004-05-12 14:09:04 +00:00
tjr
8f8a2ad179 Perform some basic validation of multibyte conversion state objects. 2004-04-12 13:09:18 +00:00
tjr
17077e5ae6 Don't cast away const qualifiers.
Spotted by:	bde
2004-04-10 00:27:52 +00:00
tjr
54a18fa1d6 Allow partial multibyte characters to accumulate in conversion state
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.
2004-04-07 10:48:19 +00:00
tjr
042225384f Fix a typo that caused mbrtowc() to always return 0. 2003-11-11 07:25:05 +00:00
tjr
1c3a3f7e26 Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement
mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left
unconverted; GB18030 will be done eventually, but GBK and UTF2 may just
be removed, as they are subsets of GB18030 and UTF-8 respectively.
2003-11-02 10:09:33 +00:00
nectar
e369901c4d Whack 28 unused variables. 2003-02-18 13:39:52 +00:00
tjr
d62abf19a3 Add a UTF-8 encoding method, which will eventually replace the antique
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible
for 16-bit characters, the new UTF-8 implementation is much more strict
about rejecting malformed input and also handles the full 31 bit range
of characters.
2002-10-10 22:56:18 +00:00