Commit Graph

635 Commits

Author SHA1 Message Date
Tim J. Robbins
ea9a9a377b Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs().
These convert plain ASCII characters in-line, making them only slightly
slower than the single-byte ("NONE" encoding) version when processing
ASCII strings.
2004-07-27 06:29:48 +00:00
Tim J. Robbins
6740cd8374 Return the correct value when dst == NULL and conversion has stopped after
nwc dropping to zero.
2004-07-22 02:57:29 +00:00
Tim J. Robbins
1949a3470f Implement the GNU extensions of mbsnrtowcs() and wcsnrtombs(). These are
convenient when the source string isn't null-terminated.

Implement the other conversion functions (mbstowcs(), mbsrtowcs(), wcstombs(),
wcsrtombs()) in terms of these new functions.
2004-07-21 10:54:57 +00:00
Tim J. Robbins
550473de5b Add fast paths for conversion of plain ASCII characters. 2004-07-09 15:46:06 +00:00
Tim J. Robbins
ee446de0b1 Add a function to iterate over all characters in a particular character
class. This is necessary in order to implement tr(1) efficiently in
multibyte locales, since the brute force method of finding all characters
in a class is infeasible with a 32-bit (or wider) wchar_t.
2004-07-08 06:43:37 +00:00
Ruslan Ermilov
b9384efc1c Markup nits. 2004-07-05 06:39:03 +00:00
Ruslan Ermilov
1c85060a13 Sort SEE ALSO references (in dictionary order, ignoring case). 2004-07-04 20:55:50 +00:00
Ruslan Ermilov
1a0a934547 Mechanically kill hard sentence breaks. 2004-07-02 23:52:20 +00:00
Ruslan Ermilov
d37ea99837 Removed trailing whitespace. 2004-07-02 19:07:33 +00:00
Ruslan Ermilov
33992dc0ed Markup, grammar, and spelling fixes. 2004-06-30 20:09:10 +00:00
Ruslan Ermilov
bd486f888e Fixed a typo. 2004-06-30 19:32:41 +00:00
Tim J. Robbins
ddc1eded85 Prefix the names of members of _RuneLocale and its sub-structures
with ``__'' to avoid polluting the namespace. This doesn't change the
documented rune interface at all, but breaks applications that accessed
_RuneLocale directly.
2004-06-23 07:01:44 +00:00
Mike Pritchard
c20133b039 Spelling fixes. 2004-06-21 19:54:56 +00:00
Tim J. Robbins
c05bd9ae25 Buffer partial wide characters more efficiently: instead of storing the
multibyte representation in conversion state objects, store the
accumulated wide character, set number and number of bytes remaining
to avoid having to derive them every time mbrtowc() is called.
2004-05-27 10:54:34 +00:00
Tim J. Robbins
18b2031298 Scan the source string for invalid wide characters in wcsrtombs()
in the dst == NULL case.
2004-05-25 10:45:24 +00:00
Tim J. Robbins
675e7ddbee Grab all the information we need about a character with one call to
__maskrune() instead of one direct call and one through iswprint().
2004-05-23 13:20:09 +00:00
Tim J. Robbins
5e44d7ebe1 Use conversion state objects to store the accumulated wide character,
low bound, and the number of bytes remaining instead of storing the
raw byte sequence and deriving them every time mbrtowc() is called.
This is much faster -- about twice as fast in some crude benchmarks.
2004-05-17 12:32:40 +00:00
Tim J. Robbins
6107476759 Use a simpler and faster buffering scheme for partial multibyte characters. 2004-05-17 11:16:14 +00:00
Tim J. Robbins
b666b593eb Use a simpler, faster buffering scheme for partial characters in mbrtowc(). 2004-05-14 15:40:47 +00:00
Tim J. Robbins
ea4ac135ff Allow encoding modules to override the default implementations of
mbsrtowcs() and wcsrtombs(). Provide a fast implementation for the
trivial "NONE" encoding.
2004-05-13 11:20:27 +00:00
Tim J. Robbins
f789f94dbb Fix braino in previous: check that the second byte in the character
buffer is non-null when the character is two bytes long, not when
the buffer is two bytes long.
2004-05-13 03:08:28 +00:00
Tim J. Robbins
6155c34adf Reduce overhead by calling internal versions of the multibyte conversion
functions directly wherever possible.
2004-05-12 14:26:54 +00:00
Tim J. Robbins
2051a8f2d5 Move prototypes of various encoding-related functions into a new header
file to avoid extern'ing them all over the place.
2004-05-12 14:09:04 +00:00
Tim J. Robbins
88af941a73 In the absence of proper validation, at least check that null bytes
do not appear as anything but the first byte of a multibyte character.
2004-05-11 14:08:22 +00:00
Tim J. Robbins
45a11576f3 Use a binary search to find the range containing a character in
RuneRange arrays. This is much faster when there are hundreds of
ranges (as is the case in UTF-8 locales) and was inspired by a
similar change made by Apple in Darwin.
2004-05-09 13:04:49 +00:00
Andrey A. Chernov
28aec5a68c Rewrite split_lines() to operate safely
PR:             62694
Submitted by:   moulin p <moulin.p@calyopea.com>
2004-04-25 19:56:50 +00:00
Tim J. Robbins
fc813796d2 Perform some basic validation of multibyte conversion state objects. 2004-04-12 13:09:18 +00:00
Tim J. Robbins
c282a0a1ed Remove a nonsensical remark about byte order markers in UTF-8 streams. 2004-04-12 12:58:41 +00:00
Tim J. Robbins
78c4a3f225 Document the meaning of the zero return value. 2004-04-11 05:19:19 +00:00
David Xu
6464650388 Fix a typo. I was locked out for two days from my machine. 2004-04-10 14:36:57 +00:00
Tim J. Robbins
fa02ee78c8 Don't cast away const qualifiers.
Spotted by:	bde
2004-04-10 00:27:52 +00:00
Tim J. Robbins
8b8109275c Update manual pages for change to C99 mbrtowc() semantics. 2004-04-08 09:59:02 +00:00
Tim J. Robbins
ca2dae426e Allow partial multibyte characters to accumulate in conversion state
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.
2004-04-07 10:48:19 +00:00
Tim J. Robbins
e97e856274 Begin conversions for sgetrune() and sputrune() in the initial
conversion state.
2004-04-07 09:49:10 +00:00
Tim J. Robbins
dc763237da Prepare to handle state-dependent encodings. This mainly involves not
taking shortcuts when it comes to storing and passing around conversion
states.
2004-04-07 09:47:56 +00:00
Tim J. Robbins
ed870c6a8e Begin in the initial shift state in mbstowcs() and wcstombs().
(This change is non-functional since nothing uses states yet.)
2004-04-07 08:33:23 +00:00
Tim J. Robbins
74f90def09 Prepare to handle state-dependent encodings. This mainly involves not
taking shortcuts when it comes to storing and passing around conversion
states.
2004-04-06 13:14:03 +00:00
Tim J. Robbins
4fb9e805dc Remove support for emulating mbrtowc() and wcrtomb() in terms of the
old rune interface now that it is no longer needed.
2004-04-04 11:31:29 +00:00
Tim J. Robbins
4f6d4aa30d Reimplement the GB18030 encoding method using the new-style (mbrtowc()/
wcrtomb()) interface.
2004-04-04 11:00:42 +00:00
Tim J. Robbins
54c61797df Reimplement the deprecated UTF2 encoding method using the UTF-8 code
as a base. mbrtowc() and wcrtomb() are now implemented directly
instead of being emulatedi with sgetrune() and sputrune().
2004-04-04 10:49:45 +00:00
Tim J. Robbins
6de4bcc717 Add cross-references to isideogram(3), isphonogram(3), isrune(3),
isspecial(3) and wctype(3).
2004-03-30 08:11:57 +00:00
Tim J. Robbins
32d9553d83 Add basic manual pages for isideogram(), isphonogram(), isrune()
and isspecial().
2004-03-30 07:23:54 +00:00
Tim J. Robbins
bee1de57ca Trim cross-references. 2004-03-30 07:19:35 +00:00
Tim J. Robbins
ba6699086d Document the isnumber() and ishexnumber() functions, and explain how they
differ (at least in theory) from isdigit() and isxdigit().
2004-03-30 07:02:04 +00:00
Tim J. Robbins
ab02b93f75 Remove duplicate MLINK. 2004-03-29 21:46:52 +00:00
Tim J. Robbins
97062607cd Recognize the "rune" character class in wctype(). 2004-03-27 08:59:21 +00:00
Diomidis Spinellis
3f0a01ea87 Make consistent with the better written wcsrtombs function:
- Fix syntax
- Remove the (slightly wrong) duplicate explanation of the error condition
- Change reference to invalid multibyte character into invalid wide character
2004-02-27 15:03:22 +00:00
Andrey A. Chernov
41ddc53bca LC_ALL not always take priority over other LC_*
Obtained from:  NetBSD
PR:             62047
2004-01-31 19:15:32 +00:00
Andrey A. Chernov
e6e9fb749a Add reference to environ(7) 2004-01-29 09:27:24 +00:00
Jacques Vidrine
84d9142f58 Remove unused variables and function declarations. Add missing headers. 2004-01-06 18:26:15 +00:00
Andrey A. Chernov
ad4688e131 Properly advance "x/y/z" form slash-pointers in some rare cases
PR:             60539
2003-12-24 10:16:46 +00:00
Andrey A. Chernov
6abda1f093 First byte of GBK-like sequences is 0x81, not 0x80 2003-12-19 12:54:42 +00:00
Tim J. Robbins
40c5c1f8a1 Set __mbrtowc and __wcrtomb correctly when changing to the C/POSIX locale.
Save __mbrtowc and __wcrtomb and restore them when changing back to
the cached locale.

Reported by:	perky
2003-12-08 23:52:22 +00:00
Tim J. Robbins
bc0b3a1800 Split multibyte(3) into separate manual pages for each function.
Instead of just deleting it, turn the original page into a general
overview of the multibyte character conversion functions, somewhat
similar to stdio(3).
2003-12-07 06:33:52 +00:00
Tim J. Robbins
da44487bd7 Split the documentation for localeconv() off into a separate manual page. 2003-12-07 06:00:00 +00:00
Tim J. Robbins
8962b7a518 Update cross references after utf2/euc move. 2003-11-15 02:26:04 +00:00
Tim J. Robbins
f76c65296c Remove section 4 versions of these manual pages, they have been
moved into section 5.
2003-11-15 02:15:25 +00:00
Tim J. Robbins
93584b12e6 Install the section 5 versions of EUC and UTF2 manual pages instead of
the section 4 versions.
2003-11-15 02:13:09 +00:00
Tim J. Robbins
ee0694adb9 Update the EUC and UTF2 manual pages for their new home in section 5.
These have been repo-copied from euc.4 and utf2.4.
2003-11-15 01:54:46 +00:00
Tim J. Robbins
b1c572ad5b Fix a typo that caused mbrtowc() to always return 0. 2003-11-11 07:25:05 +00:00
Tim J. Robbins
cc7a3285a5 Add one more cross-reference to gb2312(5). 2003-11-08 03:23:11 +00:00
Tim J. Robbins
16854d3c8f Add cross-references to new gb2312(5) manual page. 2003-11-08 03:07:56 +00:00
Tim J. Robbins
e31d6d8149 Add a fairly simple manual page for the new GB2312 encoding. 2003-11-08 03:02:45 +00:00
Tim J. Robbins
9e0bd333f0 Remove unused #includes. 2003-11-08 02:58:37 +00:00
Tim J. Robbins
eb402e14d8 Use __inline instead of inline. 2003-11-08 02:56:03 +00:00
Tim J. Robbins
c2f9330393 Refer to wide characters instead of runes. Remove redundant example locale.
Catch up with renaming of "Japanese" to "ja_JP.eucJP". Comment out the
statement that EUC is provided for compatibility with UNIX-based systems;
this is not a very good opening paragraph.
2003-11-08 02:52:31 +00:00
Tim J. Robbins
5d9c483db1 Refer to wide characters instead of runes. 2003-11-08 02:46:02 +00:00
David Xu
6d7a04b013 Add gb2312 encoding. 2003-11-05 22:52:51 +00:00
Tim J. Robbins
90c7d99f5b Implement mbrtowc() and wcrtomb() directly (sync with big5.c). 2003-11-05 07:56:45 +00:00
Tim J. Robbins
02f4f60ad5 Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement
mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left
unconverted; GB18030 will be done eventually, but GBK and UTF2 may just
be removed, as they are subsets of GB18030 and UTF-8 respectively.
2003-11-02 10:09:33 +00:00
Tim J. Robbins
d390e53270 Remove TODO comment about creating a macro version of towctrans().
Remove unnecessary inclusion of <ctype.h>.
2003-11-01 08:20:58 +00:00
Tim J. Robbins
d4f6cd06dd Allow mbrtowc() and wcrtomb() to be implemented directly, instead of
as wrappers around the deprecated 4.4BSD rune functions. This paves the
way for state-dependent encodings, which the rune API does not support.
- Add __emulated_sgetrune() and __emulated_sputrune(), which are
  implementations of sgetrune() and sputrune() in terms of
  mbrtowc() and wcrtomb().
- Rename the old rune-wrapper mbrtowc() and wcrtomb() functions to
  __emulated_mbrtowc() and __emulated_wcrtomb().
- Add __mbrtowc and __wcrtomb function pointers, which point to the
  current locale's conversion functions, or the __emulated versions.
- Implement mbrtowc() and wcrtomb() as calls to these function pointers.
- Make the "NONE" encoding implement mbrtowc() and wcrtomb() directly.

All of this emulation mess will be removed, together with rune support,
in FreeBSD 6.
2003-11-01 05:13:13 +00:00
Tim J. Robbins
1e8742e9cd Don't bother passing a freshly-zeroed mbstate to mbsrtowcs() etc.
when the current implementation won't use it, anyway. Just pass NULL.
This will need to be changed when state-dependent encodings are
supported, but there's no need to take the performance hit
in the meantime.
2003-10-31 13:29:00 +00:00
Tim J. Robbins
cf651e6b5c Implement fgetrune(), fungetrune() and fputrune() as wrappers around
fgetwc(), ungetwc() and fputwc().
2003-10-31 10:55:19 +00:00
Tim J. Robbins
4539e95a0f Remove incomplete support for running FreeBSD userland on old NetBSD kernels
lacking the issetugid() and utrace() syscalls.
2003-10-29 10:45:01 +00:00
Ruslan Ermilov
fe08efe680 mdoc(7): Use the new feature of the .In macro. 2003-09-08 19:57:22 +00:00
Tim J. Robbins
4ae3aa59ef Remove an unused and incorrect prototype for _none_init(). 2003-09-05 09:01:31 +00:00
Tim J. Robbins
e43ffa4159 Fix the case of the encoding name in the ENCODING line. Names are
case-sensitive, and MSKANJI does not work.
2003-08-10 11:41:38 +00:00
Tim J. Robbins
dcb2df4c22 Cross-reference gbk(5). 2003-08-10 11:38:28 +00:00
Tim J. Robbins
dd5e8fdef8 Cross-reference gbk(5) now that it exists. Fix a copy & paste error:
one occurrence of GB 18030 should have been 11383.
2003-08-10 11:36:42 +00:00
Tim J. Robbins
f6d8a447d1 Add a fairly minimal manual page for the GBK encoding. 2003-08-10 11:34:35 +00:00
Tim J. Robbins
9e09ac8597 Add a cross reference to Unicode 3.0. 2003-08-10 11:26:18 +00:00
Tim J. Robbins
39e2a81e3f Add cross references to the new character encoding manual pages,
and to mbsinit(3) while I'm at it.
2003-08-10 09:25:52 +00:00
Tim J. Robbins
8ca5fa518c Add manual pages for the BIG5, GB18030 and MSKanji encodings. These may
need to be fleshed out a little, especially big5(5).
2003-08-10 09:23:51 +00:00
Tim J. Robbins
b85aa4e3f7 Implement mblen(s, n) as mbtowc(NULL, s, n) to avoid calling sgetrune()
and to simplify things. This is only valid until we start supporting
state-dependent encodings.
2003-08-07 09:34:51 +00:00
Tim J. Robbins
b69a98d6d3 Implement mbstowcs() as a wrapper around mbsrtowcs(), and wcstombs()
as a wrapper around wcsrtombs().
2003-08-07 08:04:01 +00:00
Tim J. Robbins
998e124837 Implement mbtowc() in terms of mbrtowc(), and wctomb() in terms of wcrtomb(). 2003-08-07 07:59:36 +00:00
Tim J. Robbins
dab4fca49b Implement btowc() in terms of mbrtowc() instead of sgetrune(), and
wctob() in terms of wcrtomb() instead of sputrune(). There should be
no functional differences, but there may be a small performance hit
because we make an extra function call.

The aim here is to have as few functions as possible calling
s{get,put}rune() to make it easier to remove them in the future.
2003-08-07 07:45:35 +00:00
Andrey A. Chernov
a9d25ab17f Restore including of "collate.h", for its own prototype (mis)match detection 2003-08-03 19:28:23 +00:00
Andrey A. Chernov
8841d0081c Remove commented out and never used code 2003-08-03 05:20:31 +00:00
Andrey A. Chernov
17f67afe28 Remove __collate_range_cmp() stabilization, it conflicts with ranges 2003-08-03 04:40:40 +00:00
Andrey A. Chernov
a03081087c Add support for gb18030 encoding
PR:             51729
Submitted by:   Kang Liu <liukang@bjpu.edu.cn>
2003-07-29 07:52:44 +00:00
Andrey A. Chernov
8b2749e901 Add const to __setrunelocale prototype 2003-07-06 04:01:09 +00:00
Andrey A. Chernov
68d429c3fc Reorganize wrapper around setrunelocale() to mark it as deprecated
in FreeBSD 6
2003-07-06 02:03:37 +00:00
Alexey Zelkin
683fe11379 . style(9)
. fix/add comments (to cover changes done thru last 20 months)
. extend monetary testcase to cover int_* values
2003-06-26 10:46:16 +00:00
Alexey Zelkin
fca2738d67 Reduce code duplication by separating _PathLocle detection code into
internal helper function.
2003-06-25 22:42:33 +00:00
Alexey Zelkin
93c847344b Move _PathLocale declaration to more logical place (setlocale.c) 2003-06-25 22:34:13 +00:00
Alexey Zelkin
d8d4841398 Catch up with _PATH_LOCALE move from rune.h to paths.h 2003-06-25 22:31:42 +00:00
Tim J. Robbins
77156cb782 Mark the following interfaces as OBSOLETE_IN_6:
fgetrune(), fputrune(), fungetrune(), mbrune(), mbrrune(), mbmb(),
    setinvalidrune(), UTF2 encoding method.
These have been marked as being deprecated in their manual pages since 5.0,
and their use causes a linker warning.
2003-06-13 07:13:54 +00:00
Jordan K. Hubbard
3dfdc427f1 Fixes to locale code to properly use indirect pointers in order to prevent
memory leaks (fixes bugs earlier purported to be fixed).
Submitted by:	Ed Moy <emoy@apple.com>
Obtained from:	Apple Computer, Inc.
MFC after:	2 weeks
2003-06-13 00:14:07 +00:00
Andrey A. Chernov
0c7fbc6c40 Remove transition period hack 2003-06-10 01:26:04 +00:00
Andrey A. Chernov
9d793e98ec Add GBK encoding
PR:             51504
Submitted by:   Statue <statue@freebsd.sinica.edu.tw>
2003-06-01 15:30:56 +00:00
Ruslan Ermilov
3a5146d9e2 Assorted mdoc(7) fixes.
Approved by:	re (blanket)
2003-05-22 13:02:28 +00:00
Jacques Vidrine
d05090827f Back out the `hiding' of strlcpy and strlcat. Several people
vocally objected to this safety belt.
2003-05-01 19:03:14 +00:00
Jacques Vidrine
5723e501ab `Hide' strlcpy and strlcat (using the namespace.h / __weak_reference
technique) so that we don't wind up calling into an application's
version if the application defines them.

Inspired by:	qpopper's interfering and buggy version of strlcpy
2003-04-29 21:13:50 +00:00
Tim J. Robbins
e3e8878435 When called with s == NULL, behave as if wc == L'\0' as required by the
standard.
2003-04-10 09:20:38 +00:00
Andrey A. Chernov
cfcd9a45b5 According to C99 decimal_point can't be the empty string, mention it. 2003-03-20 08:13:34 +00:00
Andrey A. Chernov
befb332a6b decimal_point can't be "" according to C99, so set it to standard "."
in that case.
2003-03-20 08:05:20 +00:00
Tim J. Robbins
542bd65fcb MFp4: Implementations of the wcstof() and wcstold() functions. 2003-03-13 06:29:53 +00:00
Tim J. Robbins
60bf07bd33 Fix a bad free() call that would occur if some #if 0'd code was used. 2003-02-22 00:06:05 +00:00
Jacques Vidrine
6d7bd75a4e Whack 28 unused variables. 2003-02-18 13:39:52 +00:00
Philippe Charnier
d649825182 The .Fn function 2003-02-06 11:04:47 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Ruslan Ermilov
facc67676f mdoc(7) police: Deal with self-xrefs. 2002-12-24 13:41:48 +00:00
Ruslan Ermilov
2efeeba554 mdoc(7) police: "The .Fa argument.". 2002-12-19 09:40:28 +00:00
Ruslan Ermilov
5c564bae0a mdoc(7) police: Fixed abuses of the .Ar and .Em macros. 2002-12-18 13:33:04 +00:00
Ruslan Ermilov
1fae73b137 mdoc(7) police: "The .Fn function". 2002-12-18 12:45:11 +00:00
Ruslan Ermilov
db8993ce9e Capitalize ASCII code names.
Approved by:	re
2002-12-05 08:50:00 +00:00
Ruslan Ermilov
279062fae1 mdoc(7) police: sweep. 2002-11-29 17:35:09 +00:00
Ruslan Ermilov
92b1f2f7a3 mdoc(7) police: sweep. 2002-11-29 16:42:23 +00:00
Ruslan Ermilov
c51d717f0c libc_r wasn't so tied to libc for 22 months. 2002-11-18 09:50:57 +00:00
Tim J. Robbins
b18146b4c2 Add cross references to mbrtowc(3) and wcrtomb(3). 2002-11-10 11:14:58 +00:00
Tim J. Robbins
2f5154a2c1 Don't check whether the first byte of the buffer is a null byte when
the buffer has zero length (n == 0).
2002-11-10 10:49:14 +00:00
Tim J. Robbins
7183f43d95 Describe the n' and ps' arguments to mbrlen(). 2002-11-09 10:21:01 +00:00
Tim J. Robbins
f4937dbebc Typo: pointer to -> pointed to 2002-11-09 09:47:06 +00:00
Tim J. Robbins
490eeb06b4 Use wide character ctype functions directly instead of relying on
4.4BSD extensions to the single-byte ctype functions.
2002-11-09 05:19:08 +00:00
Tim J. Robbins
39df93ae41 Add a missing return statement for the pwcs == NULL case (XSI extension). 2002-11-09 04:13:26 +00:00
Tim J. Robbins
f6b767e33f Add two additional references to the See Also section, which contain much
better descriptions of UTF-8 and related issues.
2002-10-30 11:49:05 +00:00
Tim J. Robbins
a019c0e525 Remove unnecessary inclusion of <rune.h> to make it obvious that this file
does not use the deprecated rune system.
2002-10-29 09:03:57 +00:00
Tim J. Robbins
c5929b304e Handle boundary cases more correctly; mblen(s, 0) and mbtowc(NULL, s, 0)
return -1 regardless of what s points to, mbtowc(&w, s, 1) sets w to a
null wide character when s points to a null byte. This seems to be closer
to what most other implementations do, but the C99 standard contradicts
itself for these cases.
2002-10-28 08:24:46 +00:00
Garrett Wollman
688dfe4533 Do not include <sys/syslimits.h> directly; it is not intended for general
consumption.
2002-10-27 17:44:33 +00:00
Tim J. Robbins
b6f33850e0 Style sweep. 2002-10-27 10:41:21 +00:00
Tim J. Robbins
583efa1268 Use an internal buffer for the result when the first argument is NULL. 2002-10-25 13:24:45 +00:00
Tim J. Robbins
9acd2d9b3c Avoid truncating invalid wide characters that are outside the range of
'unsigned char'; signal an error instead.
2002-10-16 11:37:38 +00:00
Tim J. Robbins
0b78986fe2 FA, FB and FC are lead bytes according to recent Microsoft documentation. 2002-10-14 01:50:45 +00:00
Tim J. Robbins
d891f26821 Style changes. Mainly removing excessive whitespace and parens. 2002-10-14 01:46:18 +00:00
Andrey A. Chernov
8a093dade3 Cosmetic: use LCMONETARY_SIZE_{FULL,MIN} defines like in other places 2002-10-12 11:31:07 +00:00
Tim J. Robbins
972baa3747 Add a UTF-8 encoding method, which will eventually replace the antique
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible
for 16-bit characters, the new UTF-8 implementation is much more strict
about rejecting malformed input and also handles the full 31 bit range
of characters.
2002-10-10 22:56:18 +00:00
Tim J. Robbins
f4da1a754d Add support for the 6 new C99 struct lconv members dealing with formatting
international monetary values: int_p_cs_precedes, int_n_cs_precedes,
int_p_sep_by_space, int_n_sep_by_space, int_p_sign_posn, int_n_sign_posn.
This should not break existing binaries or LC_MONETARY data files.

Reviewed by:	ache
MFC after:	1 month
2002-10-09 09:19:28 +00:00
Tim J. Robbins
d9e5246b17 Add a note to the Compatiblity section suggesting that these functions
only be used for byte values. Add cross-references to the wide-char
counterparts.
2002-10-06 10:15:38 +00:00
Tim J. Robbins
82f520853b Remove rants/whines about the rune interface being superior to the
ISO C interface.
2002-10-06 06:03:23 +00:00
Tim J. Robbins
bc98899df0 Remove a completely incorrect statement from the Return Values section.
Add cross-references to the restartable mulitybte functions (mbrlen(3) etc.)
2002-10-06 05:58:24 +00:00
Tim J. Robbins
17f6e5b0e7 Improve three instances of questionable or confusing grammar. 2002-10-03 14:09:06 +00:00
Tim J. Robbins
28ddc4138c Add an example. 2002-10-03 14:07:26 +00:00
Tim J. Robbins
b06b097805 Document towlower() and towupper() in separate manual pages instead of
trying to confusingly document both on the same page. The new manual pages
are based on tolower(3) and toupper(3) instead of the old towlower(3).
2002-10-03 11:23:06 +00:00
Tim J. Robbins
9981ef2702 Point out that although toupper() and tolower() really accept rune_t's
and not just unsigned char's, callers should use towupper() and towlower()
instead when working with wide characters if portability is a concern.
2002-10-03 11:14:00 +00:00
Tim J. Robbins
73d6e4a5a2 towlower() appeared twice in the synopsis; one of the occurrences should
have been towupper(). Add towupper() to the Name section while I'm at it.

Obtained from:	NetBSD (junyoung)
2002-10-03 10:40:01 +00:00
Tim J. Robbins
f2a67ef1bd Add an Examples section with an example of how to use the functions. 2002-10-03 08:49:29 +00:00
Tim J. Robbins
03ab141313 Warn when setinvalidrune() is referenced for consistency with the rest
of the rune functions (except sgetrune() and sputrune(), which are really
macros).
2002-09-24 09:25:37 +00:00
Tim J. Robbins
1302dabd28 Add the remaining C99 wide character string to integer conversion functions.
Restrict qualifiers were added to the existing prototypes in <inttypes.h>
and the typedef for wchar_t was removed.
2002-09-22 08:06:45 +00:00
Tim J. Robbins
530bb9225d Deprecate the rest of the rune interface. 2002-09-18 06:19:12 +00:00
Tim J. Robbins
7948cae0d2 Mark mbmb(), mbrune(), and mbrrune() as deprecated functions. We want to
get applications to move to the ISO C interfaces as well as have the
freedom to replace the rune interfaces with ones that support stateful
conversions some time in the future.
2002-09-18 06:11:21 +00:00
Tim J. Robbins
03b716c4bd Add wcstod() as a wrapper around strtod(). It does not handle any characters
that strtod() does not (alternate digit characters, etc. are not handled).
2002-09-15 08:38:51 +00:00
Tim J. Robbins
528bebffb1 Use the heap instead of the stack to store temporary multibyte string
buffers; this is slower but safer for threaded programs where threads
often have relatively low stack size limits.
2002-09-15 08:06:17 +00:00
Tim J. Robbins
3a67d8efd0 Correct type of second argument: it is wchar_t ** restrict,
not wchar_t * restrict.
2002-09-12 09:25:27 +00:00
Tim J. Robbins
47794211c8 Add an implementation of wcsftime() (wide character version of strftime()). 2002-09-11 08:57:11 +00:00
Tim J. Robbins
5fd1762445 Add wcstol() and wcstoul(), based on strtol() and strtoul(). 2002-09-08 13:27:26 +00:00
Tim J. Robbins
58d38e2520 Style: One space between "restrict" qualifier and "*". 2002-09-06 11:24:06 +00:00
Tim J. Robbins
f0c6c306f9 Set errno to EILSEQ when invalid multibyte sequences are detected
(XSI extension to 1003.1-2001).
2002-09-03 01:09:47 +00:00
Tim J. Robbins
d384a6795d Typo: refer to MB_LEN_MAX instead of MB_CHAR_MAX (which does not exist). 2002-09-01 07:21:58 +00:00
Tim J. Robbins
9771f1e24e Add restrict qualifiers to the arguments of mbstowcs, mbtowc() and
wcstombs().
2002-09-01 07:08:22 +00:00
Tim J. Robbins
a5f76f1911 Implement the XSI extension which allows the destination string to be
NULL, and returns the number of bytes that would be required to store
the result of the conversion without storing anything.

PR:		17694
2002-08-31 14:16:12 +00:00
Tim J. Robbins
7438fc3aa8 Split ansi.c into a separate source file for each function. 2002-08-31 11:26:55 +00:00
Andrey A. Chernov
c14170612e Use ntohl() to read cnains number in new format 2002-08-31 01:05:39 +00:00
Andrey A. Chernov
cbc98d0541 Style fix 2002-08-30 20:39:53 +00:00
Andrey A. Chernov
8e52da4dfc Prepare for switching to unlimited chains format.
Optimize chains lookup a bit.
2002-08-30 20:26:02 +00:00
Mike Barcroft
abbd890233 o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
  macros, which are only MD because of gratuitous differences between
  architectures.
o Change all headers to make use of this.  This mainly involves
  changing:
    #ifdef _BSD_FOO_T_
    typedef	_BSD_FOO_T_	foo_t;
    #undef _BSD_FOO_T_
    #endif
  to:
    #ifndef _FOO_T_DECLARED
    typedef	__foo_t	foo_t;
    #define	_FOO_T_DECLARED
    #endif

Concept by:	bde
Reviewed by:	jake, obrien
2002-08-21 16:20:02 +00:00
Tim J. Robbins
7d77551c77 Add a manual page for wcwidth(). 2002-08-20 03:42:21 +00:00
Andrey A. Chernov
c1a0a78d00 Remove wcswidth.c from here (and move it to "string") 2002-08-20 01:59:26 +00:00
Andrey A. Chernov
8077fafd28 Remove space at the end of continuation line in prev. commit 2002-08-20 01:16:06 +00:00
Andrey A. Chernov
f999b4ba69 Implement wcswidth() 2002-08-19 20:46:10 +00:00
Andrey A. Chernov
1da6b56aca Use modern-style arguments declaration 2002-08-19 20:32:27 +00:00
Andrey A. Chernov
853c779d87 Write null wide-character as L'\0' like in other places 2002-08-19 20:12:38 +00:00
Andrey A. Chernov
1e2cd54448 According to SUSv2, always return 0 for null wide-character code 2002-08-19 18:06:18 +00:00
Andrey A. Chernov
9424df445a Move internal defines from ctype.h here 2002-08-19 09:02:49 +00:00
Tim J. Robbins
e92a3d83fc Implement the ISO C90 Amd.1 restartable wide and multibyte character
manipulation functions mbrlen(), mbrtowc(), mbsinit(), mbsrtowcs(),
wcrtomb(), wcsrtombs().
2002-08-18 06:30:10 +00:00
Andrey A. Chernov
d8d0cebecd Move wcwidth() to separate file, it doesn't belong to iswctype.c at all 2002-08-17 20:30:34 +00:00
Andrey A. Chernov
1c15ec1eab According to SUSv2, wcwidth() should return -1 for non-printing characters 2002-08-17 20:11:31 +00:00
Andrey A. Chernov
88c669d2ea Cosmetic - remove unneded brackets and #undef 2002-08-17 20:03:44 +00:00
Andrey A. Chernov
c87e6b26b0 wcwidth: fix espression to work correctly with SWIDTH0 2002-08-17 14:16:14 +00:00
Michael C . Wu
dff784192b Add iswctype wcwidth function code
Submitted by:	clkao@clkao.org
Reviewed by:	keichii
Obtained from:	NetBSD
MFC after:	1 month
2002-08-16 13:45:23 +00:00
Andrey A. Chernov
a2a26d0a3d Reduce BSS size for programs which not load collate by eliminating
static buffer.
2002-08-13 14:55:17 +00:00
Andrey A. Chernov
e34fe8a408 Now malloc() is fixed, remove errno hardcoding to ENOMEM 2002-08-12 17:14:04 +00:00
Andrey A. Chernov
ec5ca2eba7 Add safeguards to never use errno == 0 as setrunelocale() error return code 2002-08-09 08:22:29 +00:00
Andrey A. Chernov
76692b8025 Rewrite locale loading procedures, so any load failure will not affect
currently cached data.  It allows a number of nice things, like: removing
fallback code from single locale loading, remove memory leak when LC_CTYPE
data loaded again and again, efficient cache use, not only for
setlocale(locale1); setlocale(locale1), but for setlocale(locale1);
setlocale("C"); setlocale(locale1) too (i.e.  data file loaded only once).
2002-08-08 05:51:54 +00:00
Andrey A. Chernov
57473ad215 Reset __mb_cur_max to 1 when "C" or "POSIX" locales loaded after multibyte one 2002-08-07 20:49:25 +00:00
Andrey A. Chernov
45206d5c69 Fix wrong address when EucInfo > "variable" size 2002-08-07 20:20:56 +00:00
Andrey A. Chernov
6892b144e8 Style fixes in preparation for rewritting 2002-08-07 18:02:45 +00:00
Andrey A. Chernov
ecc4c62066 Style fixes in preparation of code rewritting 2002-08-07 16:45:23 +00:00
Tim J. Robbins
71a63bac1b Build iswctype.c and manual pages for the functions it defines. 2002-08-06 00:49:59 +00:00
Tim J. Robbins
677adc79c2 Add missing prototypes for extension functions to the SYNOPSIS. 2002-08-05 11:02:04 +00:00
Tim J. Robbins
21b7821a9e Use In macro instead of Fd. Add crossref to wctype(3). Refer to 1003.1-2001
in STANDARDS section. Document functions which are extensions to the standard.
2002-08-05 10:50:39 +00:00
Tim J. Robbins
15c57797bb Use the In macro instead of Fd. Add crossref to wctrans(3). Refer to
1003.1-2001 in STANDARDS section.
2002-08-05 10:48:05 +00:00
Tim J. Robbins
6b44a04d1c Implement the missing <wctype.h> functions: isw*() (iswalnum() etc.),
towlower() and towupper() required by ISO C90 Amd. 1.

iswascii(), iswhexnumber(), iswideogram(), iswnumber(), iswphonogram(),
iswrune() and iswspecial() have also been implemented for consistency
with the BSD extensions in <ctype.h>.
2002-08-05 10:45:23 +00:00
Andrey A. Chernov
3a317a1229 Reject encoding > ENCODING_LEN at early stage instead of truncating it.
Use ptr == NULL instead of !ptr in few places.
Move saverr declaration to global section.
2002-08-05 09:58:45 +00:00
Tim J. Robbins
008a2c53ce Manual pages for wide character classification (isw*) and case conversion
(tow*) functions from NetBSD, unmodified except for the addition of $FreeBSD$.

Obtained from:	NetBSD
2002-08-05 08:04:58 +00:00
Tim J. Robbins
4bd5585fbd Change wctype_t to an unsigned type to avoid warnings. 2002-08-04 12:43:53 +00:00
Tim J. Robbins
4645079944 Add the ISO C90 Amd. 1 wctrans(3) and towctrans(3) functions. 2002-08-04 12:09:08 +00:00
Tim J. Robbins
92ece88d16 Add btowc(3) to SEE ALSO section. 2002-08-04 11:02:21 +00:00
Andrey A. Chernov
f75bb0aa25 Use errno to indicate failure reason.
Remove incomplete checks for 'name' and 'PatchLocale', they must be
already checked at this point.
2002-08-04 09:37:28 +00:00
Bruce Evans
1a2140f531 Fixed some style bugs (unsorting of MLINKS, and more than 1 assignment to
MAN per section).
2002-08-04 07:54:41 +00:00
Andrey A. Chernov
10bc1114ce Rewrite loadlocale() to eliminate LOAD_CATEGORY macro to save space. 2002-08-04 04:29:54 +00:00
Andrey A. Chernov
9bb322433e Add ERRORS section according to POSIX (no errors) 2002-08-03 17:20:45 +00:00
Andrey A. Chernov
2f6754febb Catch empty encoding name too 2002-08-03 17:09:21 +00:00
Andrey A. Chernov
40b97dcb2a Fix return codes to match what setrunelocale() returns 2002-08-03 16:26:47 +00:00
Andrey A. Chernov
5740f28044 Preserve errno in fallback code 2002-08-03 15:56:25 +00:00
Tim J. Robbins
e9fb70115f Add ISO C90 Amd. 1 btowc(3) and wctob(3) functions. 2002-08-03 13:49:55 +00:00
Tim J. Robbins
196099d661 Correct use of Nm macro in NAME section and a broken cross reference. 2002-08-03 12:39:41 +00:00
Andrey A. Chernov
710d708144 Return errno provided by fopen, not always ENOENT.
Return EFTYPE instead of EINVAL for wrong locale file format.
Whitespaces.
2002-08-03 11:55:19 +00:00
Andrey A. Chernov
256ddd5999 Check encoding for ".", ".." and / inside 2002-08-03 10:23:06 +00:00
Andrey A. Chernov
5568219d15 Return EINVAL for NULL or too long encoding, not EFAULT 2002-08-03 09:10:31 +00:00
Andrey A. Chernov
83c9580dbb Return ENAMETOOLONG for long PATH_LOCALE, not EFAULT 2002-08-03 09:07:27 +00:00
Andrey A. Chernov
a17eafe2a8 1) Use errno to indicate faulure reason.
2) Move incomplete check for / in locale name from env section to
loadlocale(), add check for "." and ".." too.
It allows to check any argument, not env only.
3) Redesing LOAD_CATEGORY macro to eliminate code duplication.
4) Try harder in fallback code: if old locale can't be restored,
load "C" locale
5) White space formatting, long lines, etc.
2002-08-03 09:04:44 +00:00
Tim J. Robbins
5b32667c57 Add ISO C90 Amd. 1 wctype(3) and iswctype(3) functions. 2002-08-03 04:18:40 +00:00
Andrey A. Chernov
4e7b46d8e2 Sligtly modify previous out-of-bounds fix: just break instead of
return(NULL) for upward compatibility with more LC_* categories may be
implemented in future.
2002-08-02 13:36:54 +00:00
Andrey A. Chernov
ef1e7a2656 Prevent out of bounds writting for too many slashes case.
Replace strnpy + ='\0' with strlcpy

MFC after:	1 day
2002-08-02 01:04:49 +00:00
Jeroen Ruigrok van der Werven
eb12e52a25 Remove the hard-coded limit of 3 bytes for EUC encodings.
Satoshi NIIMI-san kindly explained that EUC does not limit the byte length to
any arbitrary number.

We now set the limit to the maximum octet length of the codeset and it is
locale-specific.

Submitted by:	Yong-Jhen Hong <winard@ms11.url.com.tw>
2002-04-14 10:55:42 +00:00
Dag-Erling Smørgrav
0b759b867a Install digittoint.3 (forgotten in rev 1.21)
PR:		docs/26451
Submitted by:	Adrian Filipi-Martin <adrian@ubergeeks.com>
2002-04-13 22:32:33 +00:00
Dima Dorfman
62538f5a03 This was recently MFC'd, so it will appear in 4.6.
PR:		37018
2002-04-13 04:25:56 +00:00
Jeroen Ruigrok van der Werven
a243e676fe Fix EUC encoding conversion for codeset 3 and 4 to comply to the specification.
PR:		28552
Submitted by:	NIIMI Satoshi <sa2c@and.or.jp>
2002-04-07 16:37:15 +00:00
Mark Murray
4cd0119367 Do not use __progname directly (except in [gs]etprogname(3)).
Also, make an internal _getprogname() that is used only inside
libc. For libc, getprogname(3) is a weak symbol in case a
function of the same name is defined in userland.
2002-03-29 22:43:43 +00:00
David E. O'Brien
333fc21e3c Fix the style of the SCM ID's.
I believe have made all of libc .c's as consistent as possible.
2002-03-22 21:53:29 +00:00
David E. O'Brien
c05ac53b8b Remove __P() usage. 2002-03-21 22:49:10 +00:00
Ruslan Ermilov
c8c5079d10 mdoc(7) police: tidy up. 2002-03-18 15:44:27 +00:00
Ruslan Ermilov
dab055db89 bde got caught by mdoc(7) police. :-) 2002-03-15 17:53:20 +00:00
Ruslan Ermilov
9a04350e3d mdoc(7) police: don't you notice the warnings? 2002-03-15 17:50:21 +00:00
Mike Barcroft
fd8e4ebc8c o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
  source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
  Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
  POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
  and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
  complexities associated with having MD (asm and inline) versions, and
  having to prevent exposure of these functions in other headers that
  happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
  third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on:	alpha, i386
Reviewed by:	bde, jake, tmm
2002-02-18 20:35:27 +00:00
Andrey A. Chernov
831e8f614c Do not try to convert to char already converted C monetary locale members.
Do this conversion on locale load stage instead.
2002-01-28 08:26:38 +00:00
Alexey Zelkin
a2fb0481d7 get __time_load_locale() prototype from include file, rather than declare
own
2002-01-24 15:38:59 +00:00
Andrey A. Chernov
ff7448f849 Restore C99 standard conformance information, isblank() _is_ in final
standard document

Pointed by: "Jacques A. Vidrine" <n@nectar.cc>
2002-01-22 20:14:35 +00:00
Bruce Evans
afac94af5c Replaced bogus cross references by the usual one for the ctype family
(ctype(3)).
2002-01-11 15:39:50 +00:00
Bruce Evans
87e0032026 Removed assertion that isblank() conforms to C90 too. This assertion
is correct but less than useful.  There is some uncertainty about whether
isblank() is in C99, but it is certainly not in C90.  It just conforms
to C89 because it is a conforming extension.
2002-01-11 15:21:03 +00:00
Bruce Evans
5fb3acfaaf Fixed unsorting of almost all lists in previous commit.
Removed assertion that isblank() is in C99 here too.
2002-01-11 15:15:17 +00:00
Bruce Evans
758671eb0d Fixed unsorting of MLINKS in previous commit.
Fixed unsorting of SRCS in rev.1.18.
2002-01-11 14:57:11 +00:00
Nik Clayton
6a3003ce51 Remove assertion that isblank() is in C99, pointed out by ache. 2002-01-10 12:22:00 +00:00
Nik Clayton
26dabb6003 From the PR:
1. ctype.h defines digittoint(), isnumber() and ishexnmber(), yet
        they are not documented in any of the manpages.

        2. The ctype manpage references a non-existent manpage for
        digittoint().

        3. The isascii() manpage claims it is standards compliant, when
        it isn't.

        4. isblank() claims it is _not_ standards compliant, when it
        is.

Fix by including the appropriate .Nm entries, and with a new digittoint.3
page.

PR:		docs/26451
Submitted by:	Adrian Filipi-Martin <adrian@ubergeeks.com>
2002-01-09 13:43:31 +00:00
Alexey Zelkin
0388ec7cac Back out recent replacement of LC_MESSAGES file with directory.
Requested by:   ache
2001-12-24 11:49:49 +00:00
Alexey Zelkin
709eed76bd Slightly re-work locale messages storage scheme. Before this commit
LC_MESSAGES related data was installed to <locale>/LC_MESSAGES file.
Now it go to <locale>/LC_MESSAGES/SYS_LC_MESSAGES file. LC_MESSAGES
directory is supposed to be storage of message catalogs of userland tools.
This should allow us to avoid many potential problems with future
libintl related functionality introduction.

Thanks for useful suggestions about correct way how to replace plain
files with directories at installworld stage to: Ruslan Ermilov <ru>
2001-12-21 13:14:02 +00:00
Alexey Zelkin
f43a321bf0 style(9)'ify 2001-12-20 18:28:52 +00:00
Alexey Zelkin
f2d0f4274b Add my e-mail to copyrights 2001-12-20 15:30:02 +00:00
Ruslan Ermilov
de400a7b97 mdoc(7) police: use no-break space. 2001-12-12 13:46:15 +00:00
Ruslan Ermilov
f079ab33c6 mdoc(7) police: use no-break space, fix markup. 2001-12-12 13:45:35 +00:00
Ruslan Ermilov
5411c22ae5 mdoc(7) police: use non-break space, remove whitespace at EOL, fix markup. 2001-12-12 13:42:25 +00:00
Alexey Zelkin
74f2b97544 * Add my e-mail to copyrights
* style(9)'ify
2001-12-11 15:55:42 +00:00
Alexey Zelkin
b21656a8f4 Fix grouping string handling 2001-12-11 15:26:36 +00:00
Andrey A. Chernov
8ede26d773 Clarify ' ' space issue 2001-12-05 16:33:11 +00:00
Andrey A. Chernov
e3a3c468a5 Remove specific reference to ASCII space (' '), it is true for localized
spaces too
2001-12-02 12:31:44 +00:00
Andrey A. Chernov
f2f94c9675 Clarify isblank range 2001-11-30 05:39:08 +00:00
Andrey A. Chernov
a72d401cce Clarify valid isspace() range 2001-11-30 02:01:32 +00:00
Andrey A. Chernov
307b922e38 Clarify that is[x]digit() class is the same in any locale 2001-11-29 15:23:46 +00:00