Tim J. Robbins
e214931fbf
Remove useless checks for characters longer than INT_MAX bytes.
2004-07-29 06:08:31 +00:00
Tim J. Robbins
ea9a9a377b
Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs().
...
These convert plain ASCII characters in-line, making them only slightly
slower than the single-byte ("NONE" encoding) version when processing
ASCII strings.
2004-07-27 06:29:48 +00:00
Tim J. Robbins
6740cd8374
Return the correct value when dst == NULL and conversion has stopped after
...
nwc dropping to zero.
2004-07-22 02:57:29 +00:00
Tim J. Robbins
1949a3470f
Implement the GNU extensions of mbsnrtowcs() and wcsnrtombs(). These are
...
convenient when the source string isn't null-terminated.
Implement the other conversion functions (mbstowcs(), mbsrtowcs(), wcstombs(),
wcsrtombs()) in terms of these new functions.
2004-07-21 10:54:57 +00:00
Tim J. Robbins
550473de5b
Add fast paths for conversion of plain ASCII characters.
2004-07-09 15:46:06 +00:00
Tim J. Robbins
ee446de0b1
Add a function to iterate over all characters in a particular character
...
class. This is necessary in order to implement tr(1) efficiently in
multibyte locales, since the brute force method of finding all characters
in a class is infeasible with a 32-bit (or wider) wchar_t.
2004-07-08 06:43:37 +00:00
Ruslan Ermilov
b9384efc1c
Markup nits.
2004-07-05 06:39:03 +00:00
Ruslan Ermilov
1c85060a13
Sort SEE ALSO references (in dictionary order, ignoring case).
2004-07-04 20:55:50 +00:00
Ruslan Ermilov
1a0a934547
Mechanically kill hard sentence breaks.
2004-07-02 23:52:20 +00:00
Ruslan Ermilov
d37ea99837
Removed trailing whitespace.
2004-07-02 19:07:33 +00:00
Ruslan Ermilov
33992dc0ed
Markup, grammar, and spelling fixes.
2004-06-30 20:09:10 +00:00
Ruslan Ermilov
bd486f888e
Fixed a typo.
2004-06-30 19:32:41 +00:00
Tim J. Robbins
ddc1eded85
Prefix the names of members of _RuneLocale and its sub-structures
...
with ``__'' to avoid polluting the namespace. This doesn't change the
documented rune interface at all, but breaks applications that accessed
_RuneLocale directly.
2004-06-23 07:01:44 +00:00
Mike Pritchard
c20133b039
Spelling fixes.
2004-06-21 19:54:56 +00:00
Tim J. Robbins
c05bd9ae25
Buffer partial wide characters more efficiently: instead of storing the
...
multibyte representation in conversion state objects, store the
accumulated wide character, set number and number of bytes remaining
to avoid having to derive them every time mbrtowc() is called.
2004-05-27 10:54:34 +00:00
Tim J. Robbins
18b2031298
Scan the source string for invalid wide characters in wcsrtombs()
...
in the dst == NULL case.
2004-05-25 10:45:24 +00:00
Tim J. Robbins
675e7ddbee
Grab all the information we need about a character with one call to
...
__maskrune() instead of one direct call and one through iswprint().
2004-05-23 13:20:09 +00:00
Tim J. Robbins
5e44d7ebe1
Use conversion state objects to store the accumulated wide character,
...
low bound, and the number of bytes remaining instead of storing the
raw byte sequence and deriving them every time mbrtowc() is called.
This is much faster -- about twice as fast in some crude benchmarks.
2004-05-17 12:32:40 +00:00
Tim J. Robbins
6107476759
Use a simpler and faster buffering scheme for partial multibyte characters.
2004-05-17 11:16:14 +00:00
Tim J. Robbins
b666b593eb
Use a simpler, faster buffering scheme for partial characters in mbrtowc().
2004-05-14 15:40:47 +00:00
Tim J. Robbins
ea4ac135ff
Allow encoding modules to override the default implementations of
...
mbsrtowcs() and wcsrtombs(). Provide a fast implementation for the
trivial "NONE" encoding.
2004-05-13 11:20:27 +00:00
Tim J. Robbins
f789f94dbb
Fix braino in previous: check that the second byte in the character
...
buffer is non-null when the character is two bytes long, not when
the buffer is two bytes long.
2004-05-13 03:08:28 +00:00
Tim J. Robbins
6155c34adf
Reduce overhead by calling internal versions of the multibyte conversion
...
functions directly wherever possible.
2004-05-12 14:26:54 +00:00
Tim J. Robbins
2051a8f2d5
Move prototypes of various encoding-related functions into a new header
...
file to avoid extern'ing them all over the place.
2004-05-12 14:09:04 +00:00
Tim J. Robbins
88af941a73
In the absence of proper validation, at least check that null bytes
...
do not appear as anything but the first byte of a multibyte character.
2004-05-11 14:08:22 +00:00
Tim J. Robbins
45a11576f3
Use a binary search to find the range containing a character in
...
RuneRange arrays. This is much faster when there are hundreds of
ranges (as is the case in UTF-8 locales) and was inspired by a
similar change made by Apple in Darwin.
2004-05-09 13:04:49 +00:00
Andrey A. Chernov
28aec5a68c
Rewrite split_lines() to operate safely
...
PR: 62694
Submitted by: moulin p <moulin.p@calyopea.com>
2004-04-25 19:56:50 +00:00
Tim J. Robbins
fc813796d2
Perform some basic validation of multibyte conversion state objects.
2004-04-12 13:09:18 +00:00
Tim J. Robbins
c282a0a1ed
Remove a nonsensical remark about byte order markers in UTF-8 streams.
2004-04-12 12:58:41 +00:00
Tim J. Robbins
78c4a3f225
Document the meaning of the zero return value.
2004-04-11 05:19:19 +00:00
David Xu
6464650388
Fix a typo. I was locked out for two days from my machine.
2004-04-10 14:36:57 +00:00
Tim J. Robbins
fa02ee78c8
Don't cast away const qualifiers.
...
Spotted by: bde
2004-04-10 00:27:52 +00:00
Tim J. Robbins
8b8109275c
Update manual pages for change to C99 mbrtowc() semantics.
2004-04-08 09:59:02 +00:00
Tim J. Robbins
ca2dae426e
Allow partial multibyte characters to accumulate in conversion state
...
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.
2004-04-07 10:48:19 +00:00
Tim J. Robbins
e97e856274
Begin conversions for sgetrune() and sputrune() in the initial
...
conversion state.
2004-04-07 09:49:10 +00:00
Tim J. Robbins
dc763237da
Prepare to handle state-dependent encodings. This mainly involves not
...
taking shortcuts when it comes to storing and passing around conversion
states.
2004-04-07 09:47:56 +00:00
Tim J. Robbins
ed870c6a8e
Begin in the initial shift state in mbstowcs() and wcstombs().
...
(This change is non-functional since nothing uses states yet.)
2004-04-07 08:33:23 +00:00
Tim J. Robbins
74f90def09
Prepare to handle state-dependent encodings. This mainly involves not
...
taking shortcuts when it comes to storing and passing around conversion
states.
2004-04-06 13:14:03 +00:00
Tim J. Robbins
4fb9e805dc
Remove support for emulating mbrtowc() and wcrtomb() in terms of the
...
old rune interface now that it is no longer needed.
2004-04-04 11:31:29 +00:00
Tim J. Robbins
4f6d4aa30d
Reimplement the GB18030 encoding method using the new-style (mbrtowc()/
...
wcrtomb()) interface.
2004-04-04 11:00:42 +00:00
Tim J. Robbins
54c61797df
Reimplement the deprecated UTF2 encoding method using the UTF-8 code
...
as a base. mbrtowc() and wcrtomb() are now implemented directly
instead of being emulatedi with sgetrune() and sputrune().
2004-04-04 10:49:45 +00:00
Tim J. Robbins
6de4bcc717
Add cross-references to isideogram(3), isphonogram(3), isrune(3),
...
isspecial(3) and wctype(3).
2004-03-30 08:11:57 +00:00
Tim J. Robbins
32d9553d83
Add basic manual pages for isideogram(), isphonogram(), isrune()
...
and isspecial().
2004-03-30 07:23:54 +00:00
Tim J. Robbins
bee1de57ca
Trim cross-references.
2004-03-30 07:19:35 +00:00
Tim J. Robbins
ba6699086d
Document the isnumber() and ishexnumber() functions, and explain how they
...
differ (at least in theory) from isdigit() and isxdigit().
2004-03-30 07:02:04 +00:00
Tim J. Robbins
ab02b93f75
Remove duplicate MLINK.
2004-03-29 21:46:52 +00:00
Tim J. Robbins
97062607cd
Recognize the "rune" character class in wctype().
2004-03-27 08:59:21 +00:00
Diomidis Spinellis
3f0a01ea87
Make consistent with the better written wcsrtombs function:
...
- Fix syntax
- Remove the (slightly wrong) duplicate explanation of the error condition
- Change reference to invalid multibyte character into invalid wide character
2004-02-27 15:03:22 +00:00
Andrey A. Chernov
41ddc53bca
LC_ALL not always take priority over other LC_*
...
Obtained from: NetBSD
PR: 62047
2004-01-31 19:15:32 +00:00
Andrey A. Chernov
e6e9fb749a
Add reference to environ(7)
2004-01-29 09:27:24 +00:00
Jacques Vidrine
84d9142f58
Remove unused variables and function declarations. Add missing headers.
2004-01-06 18:26:15 +00:00
Andrey A. Chernov
ad4688e131
Properly advance "x/y/z" form slash-pointers in some rare cases
...
PR: 60539
2003-12-24 10:16:46 +00:00
Andrey A. Chernov
6abda1f093
First byte of GBK-like sequences is 0x81, not 0x80
2003-12-19 12:54:42 +00:00
Tim J. Robbins
40c5c1f8a1
Set __mbrtowc and __wcrtomb correctly when changing to the C/POSIX locale.
...
Save __mbrtowc and __wcrtomb and restore them when changing back to
the cached locale.
Reported by: perky
2003-12-08 23:52:22 +00:00
Tim J. Robbins
bc0b3a1800
Split multibyte(3) into separate manual pages for each function.
...
Instead of just deleting it, turn the original page into a general
overview of the multibyte character conversion functions, somewhat
similar to stdio(3).
2003-12-07 06:33:52 +00:00
Tim J. Robbins
da44487bd7
Split the documentation for localeconv() off into a separate manual page.
2003-12-07 06:00:00 +00:00
Tim J. Robbins
8962b7a518
Update cross references after utf2/euc move.
2003-11-15 02:26:04 +00:00
Tim J. Robbins
f76c65296c
Remove section 4 versions of these manual pages, they have been
...
moved into section 5.
2003-11-15 02:15:25 +00:00
Tim J. Robbins
93584b12e6
Install the section 5 versions of EUC and UTF2 manual pages instead of
...
the section 4 versions.
2003-11-15 02:13:09 +00:00
Tim J. Robbins
ee0694adb9
Update the EUC and UTF2 manual pages for their new home in section 5.
...
These have been repo-copied from euc.4 and utf2.4.
2003-11-15 01:54:46 +00:00
Tim J. Robbins
b1c572ad5b
Fix a typo that caused mbrtowc() to always return 0.
2003-11-11 07:25:05 +00:00
Tim J. Robbins
cc7a3285a5
Add one more cross-reference to gb2312(5).
2003-11-08 03:23:11 +00:00
Tim J. Robbins
16854d3c8f
Add cross-references to new gb2312(5) manual page.
2003-11-08 03:07:56 +00:00
Tim J. Robbins
e31d6d8149
Add a fairly simple manual page for the new GB2312 encoding.
2003-11-08 03:02:45 +00:00
Tim J. Robbins
9e0bd333f0
Remove unused #includes.
2003-11-08 02:58:37 +00:00
Tim J. Robbins
eb402e14d8
Use __inline instead of inline.
2003-11-08 02:56:03 +00:00
Tim J. Robbins
c2f9330393
Refer to wide characters instead of runes. Remove redundant example locale.
...
Catch up with renaming of "Japanese" to "ja_JP.eucJP". Comment out the
statement that EUC is provided for compatibility with UNIX-based systems;
this is not a very good opening paragraph.
2003-11-08 02:52:31 +00:00
Tim J. Robbins
5d9c483db1
Refer to wide characters instead of runes.
2003-11-08 02:46:02 +00:00
David Xu
6d7a04b013
Add gb2312 encoding.
2003-11-05 22:52:51 +00:00
Tim J. Robbins
90c7d99f5b
Implement mbrtowc() and wcrtomb() directly (sync with big5.c).
2003-11-05 07:56:45 +00:00
Tim J. Robbins
02f4f60ad5
Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement
...
mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left
unconverted; GB18030 will be done eventually, but GBK and UTF2 may just
be removed, as they are subsets of GB18030 and UTF-8 respectively.
2003-11-02 10:09:33 +00:00
Tim J. Robbins
d390e53270
Remove TODO comment about creating a macro version of towctrans().
...
Remove unnecessary inclusion of <ctype.h>.
2003-11-01 08:20:58 +00:00
Tim J. Robbins
d4f6cd06dd
Allow mbrtowc() and wcrtomb() to be implemented directly, instead of
...
as wrappers around the deprecated 4.4BSD rune functions. This paves the
way for state-dependent encodings, which the rune API does not support.
- Add __emulated_sgetrune() and __emulated_sputrune(), which are
implementations of sgetrune() and sputrune() in terms of
mbrtowc() and wcrtomb().
- Rename the old rune-wrapper mbrtowc() and wcrtomb() functions to
__emulated_mbrtowc() and __emulated_wcrtomb().
- Add __mbrtowc and __wcrtomb function pointers, which point to the
current locale's conversion functions, or the __emulated versions.
- Implement mbrtowc() and wcrtomb() as calls to these function pointers.
- Make the "NONE" encoding implement mbrtowc() and wcrtomb() directly.
All of this emulation mess will be removed, together with rune support,
in FreeBSD 6.
2003-11-01 05:13:13 +00:00
Tim J. Robbins
1e8742e9cd
Don't bother passing a freshly-zeroed mbstate to mbsrtowcs() etc.
...
when the current implementation won't use it, anyway. Just pass NULL.
This will need to be changed when state-dependent encodings are
supported, but there's no need to take the performance hit
in the meantime.
2003-10-31 13:29:00 +00:00
Tim J. Robbins
cf651e6b5c
Implement fgetrune(), fungetrune() and fputrune() as wrappers around
...
fgetwc(), ungetwc() and fputwc().
2003-10-31 10:55:19 +00:00
Tim J. Robbins
4539e95a0f
Remove incomplete support for running FreeBSD userland on old NetBSD kernels
...
lacking the issetugid() and utrace() syscalls.
2003-10-29 10:45:01 +00:00
Ruslan Ermilov
fe08efe680
mdoc(7): Use the new feature of the .In macro.
2003-09-08 19:57:22 +00:00
Tim J. Robbins
4ae3aa59ef
Remove an unused and incorrect prototype for _none_init().
2003-09-05 09:01:31 +00:00
Tim J. Robbins
e43ffa4159
Fix the case of the encoding name in the ENCODING line. Names are
...
case-sensitive, and MSKANJI does not work.
2003-08-10 11:41:38 +00:00
Tim J. Robbins
dcb2df4c22
Cross-reference gbk(5).
2003-08-10 11:38:28 +00:00
Tim J. Robbins
dd5e8fdef8
Cross-reference gbk(5) now that it exists. Fix a copy & paste error:
...
one occurrence of GB 18030 should have been 11383.
2003-08-10 11:36:42 +00:00
Tim J. Robbins
f6d8a447d1
Add a fairly minimal manual page for the GBK encoding.
2003-08-10 11:34:35 +00:00
Tim J. Robbins
9e09ac8597
Add a cross reference to Unicode 3.0.
2003-08-10 11:26:18 +00:00
Tim J. Robbins
39e2a81e3f
Add cross references to the new character encoding manual pages,
...
and to mbsinit(3) while I'm at it.
2003-08-10 09:25:52 +00:00
Tim J. Robbins
8ca5fa518c
Add manual pages for the BIG5, GB18030 and MSKanji encodings. These may
...
need to be fleshed out a little, especially big5(5).
2003-08-10 09:23:51 +00:00
Tim J. Robbins
b85aa4e3f7
Implement mblen(s, n) as mbtowc(NULL, s, n) to avoid calling sgetrune()
...
and to simplify things. This is only valid until we start supporting
state-dependent encodings.
2003-08-07 09:34:51 +00:00
Tim J. Robbins
b69a98d6d3
Implement mbstowcs() as a wrapper around mbsrtowcs(), and wcstombs()
...
as a wrapper around wcsrtombs().
2003-08-07 08:04:01 +00:00
Tim J. Robbins
998e124837
Implement mbtowc() in terms of mbrtowc(), and wctomb() in terms of wcrtomb().
2003-08-07 07:59:36 +00:00
Tim J. Robbins
dab4fca49b
Implement btowc() in terms of mbrtowc() instead of sgetrune(), and
...
wctob() in terms of wcrtomb() instead of sputrune(). There should be
no functional differences, but there may be a small performance hit
because we make an extra function call.
The aim here is to have as few functions as possible calling
s{get,put}rune() to make it easier to remove them in the future.
2003-08-07 07:45:35 +00:00
Andrey A. Chernov
a9d25ab17f
Restore including of "collate.h", for its own prototype (mis)match detection
2003-08-03 19:28:23 +00:00
Andrey A. Chernov
8841d0081c
Remove commented out and never used code
2003-08-03 05:20:31 +00:00
Andrey A. Chernov
17f67afe28
Remove __collate_range_cmp() stabilization, it conflicts with ranges
2003-08-03 04:40:40 +00:00
Andrey A. Chernov
a03081087c
Add support for gb18030 encoding
...
PR: 51729
Submitted by: Kang Liu <liukang@bjpu.edu.cn>
2003-07-29 07:52:44 +00:00
Andrey A. Chernov
8b2749e901
Add const to __setrunelocale prototype
2003-07-06 04:01:09 +00:00
Andrey A. Chernov
68d429c3fc
Reorganize wrapper around setrunelocale() to mark it as deprecated
...
in FreeBSD 6
2003-07-06 02:03:37 +00:00
Alexey Zelkin
683fe11379
. style(9)
...
. fix/add comments (to cover changes done thru last 20 months)
. extend monetary testcase to cover int_* values
2003-06-26 10:46:16 +00:00
Alexey Zelkin
fca2738d67
Reduce code duplication by separating _PathLocle detection code into
...
internal helper function.
2003-06-25 22:42:33 +00:00
Alexey Zelkin
93c847344b
Move _PathLocale declaration to more logical place (setlocale.c)
2003-06-25 22:34:13 +00:00
Alexey Zelkin
d8d4841398
Catch up with _PATH_LOCALE move from rune.h to paths.h
2003-06-25 22:31:42 +00:00
Tim J. Robbins
77156cb782
Mark the following interfaces as OBSOLETE_IN_6:
...
fgetrune(), fputrune(), fungetrune(), mbrune(), mbrrune(), mbmb(),
setinvalidrune(), UTF2 encoding method.
These have been marked as being deprecated in their manual pages since 5.0,
and their use causes a linker warning.
2003-06-13 07:13:54 +00:00