51 Commits

Author SHA1 Message Date
Andrey A. Chernov
67e7bdee55 Fix longstanding mb/wc functions segfault if error occurse
inside _<encoding>_init().
Currently _EUC_init() only was affected.
2008-01-23 03:05:35 +00:00
Andrey A. Chernov
5ebf111155 Better fix for longstanding segfault. Don't touch current locale at all
on unknown encoding. Previous fix resets it to POSIX.
2008-01-23 02:17:27 +00:00
Andrey A. Chernov
5776848851 1) Add (void) cast to _none_init() (while I am here)
2) Fix longstanding segfault in mb/wc code when unknown encoding is specified
in the locale file (mb/wc functions becomes NULL in that case).
2008-01-23 01:57:26 +00:00
Andrey A. Chernov
91e0bf6a77 Introduce new encoding: "ASCII"
It differs from default C/POSIX "NONE" mainly by stricter 8bit check
for mb*towc*/wc*tomb* family, returning EILSEQ
2008-01-21 23:48:12 +00:00
Andrey A. Chernov
367ed4e13d The problem is: currently our single byte ctype(3) functions are broken
for wide characters locales in the argument range >= 0x80 - they may
return false positives.

Example 1: for UTF-8 locale we currently have:
iswspace(0xA0)==1 and isspace(0xA0)==1
(because iswspace() and isspace() are the same code)
but must have
iswspace(0xA0)==1 and isspace(0xA0)==0
(because there is no such character and all others in the range
0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte
range because our internal wchar_t representation for UTF-8 is UCS-4).

Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may
return false positives (must be 0).
(because iswalpha() and isalpha() are the same code)

This change address this issue separating single byte and wide ctype
and also fix iswascii() (currently iswascii() is broken for
arguments > 0xFF).
This change is 100% binary compatible with old binaries.

Reviewied by: i18n@
2007-10-13 16:28:22 +00:00
Warner Losh
c879ae3536 Per Regents of the University of Calfornia letter, remove advertising
clause.

# If I've done so improperly on a file, please let me know.
2007-01-09 00:28:16 +00:00
Alexey Zelkin
e94c6cb4a2 . Static'ize functions exported via function reference variables only.
. Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where
  appropriate.
. move _*_init() prototypes to mblocal.h, and remove these prototypes
  from .c files
. use _none_init() in __setrunelocale() instead of duplicating code
. move __mb* variables from table.c to none.c allowing us to not to
  export _none_*() externs, and appropriately remove them from mblocal.h

Ok'ed by:	tjr
2005-02-27 15:11:09 +00:00
Andrey A. Chernov
27ecbe8a77 Remove setrunelocale() 2004-10-18 02:06:18 +00:00
Tim J. Robbins
31d330fb2a Remove the obsolete <rune.h> interface. 2004-10-17 06:51:50 +00:00
Tim J. Robbins
79a3948997 Remove support for the obsolete UTF2 encoding. 2004-10-17 02:29:15 +00:00
Tim J. Robbins
5a52f3c22c Change "deprecated" in link-time warnings about various rune functions
to "obsolete".
2004-08-21 07:48:06 +00:00
Tim J. Robbins
1949a3470f Implement the GNU extensions of mbsnrtowcs() and wcsnrtombs(). These are
convenient when the source string isn't null-terminated.

Implement the other conversion functions (mbstowcs(), mbsrtowcs(), wcstombs(),
wcsrtombs()) in terms of these new functions.
2004-07-21 10:54:57 +00:00
Tim J. Robbins
ddc1eded85 Prefix the names of members of _RuneLocale and its sub-structures
with ``__'' to avoid polluting the namespace. This doesn't change the
documented rune interface at all, but breaks applications that accessed
_RuneLocale directly.
2004-06-23 07:01:44 +00:00
Tim J. Robbins
ea4ac135ff Allow encoding modules to override the default implementations of
mbsrtowcs() and wcsrtombs(). Provide a fast implementation for the
trivial "NONE" encoding.
2004-05-13 11:20:27 +00:00
Tim J. Robbins
2051a8f2d5 Move prototypes of various encoding-related functions into a new header
file to avoid extern'ing them all over the place.
2004-05-12 14:09:04 +00:00
Tim J. Robbins
ca2dae426e Allow partial multibyte characters to accumulate in conversion state
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.
2004-04-07 10:48:19 +00:00
Tim J. Robbins
4fb9e805dc Remove support for emulating mbrtowc() and wcrtomb() in terms of the
old rune interface now that it is no longer needed.
2004-04-04 11:31:29 +00:00
Tim J. Robbins
40c5c1f8a1 Set __mbrtowc and __wcrtomb correctly when changing to the C/POSIX locale.
Save __mbrtowc and __wcrtomb and restore them when changing back to
the cached locale.

Reported by:	perky
2003-12-08 23:52:22 +00:00
David Xu
6d7a04b013 Add gb2312 encoding. 2003-11-05 22:52:51 +00:00
Tim J. Robbins
d4f6cd06dd Allow mbrtowc() and wcrtomb() to be implemented directly, instead of
as wrappers around the deprecated 4.4BSD rune functions. This paves the
way for state-dependent encodings, which the rune API does not support.
- Add __emulated_sgetrune() and __emulated_sputrune(), which are
  implementations of sgetrune() and sputrune() in terms of
  mbrtowc() and wcrtomb().
- Rename the old rune-wrapper mbrtowc() and wcrtomb() functions to
  __emulated_mbrtowc() and __emulated_wcrtomb().
- Add __mbrtowc and __wcrtomb function pointers, which point to the
  current locale's conversion functions, or the __emulated versions.
- Implement mbrtowc() and wcrtomb() as calls to these function pointers.
- Make the "NONE" encoding implement mbrtowc() and wcrtomb() directly.

All of this emulation mess will be removed, together with rune support,
in FreeBSD 6.
2003-11-01 05:13:13 +00:00
Andrey A. Chernov
a03081087c Add support for gb18030 encoding
PR:             51729
Submitted by:   Kang Liu <liukang@bjpu.edu.cn>
2003-07-29 07:52:44 +00:00
Andrey A. Chernov
8b2749e901 Add const to __setrunelocale prototype 2003-07-06 04:01:09 +00:00
Andrey A. Chernov
68d429c3fc Reorganize wrapper around setrunelocale() to mark it as deprecated
in FreeBSD 6
2003-07-06 02:03:37 +00:00
Alexey Zelkin
fca2738d67 Reduce code duplication by separating _PathLocle detection code into
internal helper function.
2003-06-25 22:42:33 +00:00
Andrey A. Chernov
9d793e98ec Add GBK encoding
PR:             51504
Submitted by:   Statue <statue@freebsd.sinica.edu.tw>
2003-06-01 15:30:56 +00:00
Tim J. Robbins
972baa3747 Add a UTF-8 encoding method, which will eventually replace the antique
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible
for 16-bit characters, the new UTF-8 implementation is much more strict
about rejecting malformed input and also handles the full 31 bit range
of characters.
2002-10-10 22:56:18 +00:00
Andrey A. Chernov
ec5ca2eba7 Add safeguards to never use errno == 0 as setrunelocale() error return code 2002-08-09 08:22:29 +00:00
Andrey A. Chernov
76692b8025 Rewrite locale loading procedures, so any load failure will not affect
currently cached data.  It allows a number of nice things, like: removing
fallback code from single locale loading, remove memory leak when LC_CTYPE
data loaded again and again, efficient cache use, not only for
setlocale(locale1); setlocale(locale1), but for setlocale(locale1);
setlocale("C"); setlocale(locale1) too (i.e.  data file loaded only once).
2002-08-08 05:51:54 +00:00
Andrey A. Chernov
57473ad215 Reset __mb_cur_max to 1 when "C" or "POSIX" locales loaded after multibyte one 2002-08-07 20:49:25 +00:00
Andrey A. Chernov
2f6754febb Catch empty encoding name too 2002-08-03 17:09:21 +00:00
Andrey A. Chernov
710d708144 Return errno provided by fopen, not always ENOENT.
Return EFTYPE instead of EINVAL for wrong locale file format.
Whitespaces.
2002-08-03 11:55:19 +00:00
Andrey A. Chernov
256ddd5999 Check encoding for ".", ".." and / inside 2002-08-03 10:23:06 +00:00
Andrey A. Chernov
5568219d15 Return EINVAL for NULL or too long encoding, not EFAULT 2002-08-03 09:10:31 +00:00
Andrey A. Chernov
83c9580dbb Return ENAMETOOLONG for long PATH_LOCALE, not EFAULT 2002-08-03 09:07:27 +00:00
David E. O'Brien
333fc21e3c Fix the style of the SCM ID's.
I believe have made all of libc .c's as consistent as possible.
2002-03-22 21:53:29 +00:00
David E. O'Brien
c05ac53b8b Remove __P() usage. 2002-03-21 22:49:10 +00:00
Andrey A. Chernov
8b96e6c916 Megre XPG4 code into libc 2000-06-03 12:24:08 +00:00
Poul-Henning Kamp
7a55a3c230 I have added the support for BIG5 encoding into libc/libxpg4/mklocale.
the diff is attached below. This is done on the 3.0 source-tree.
I have test this on 2.2-stable before, but I don't have a 3.0 machine
right now.

This patch is mainly to make libc support BIG5 encoding, thus add
zh_TW.BIG5 locale to 3.0.

Submitted by:	Chen Hsiung Chan <frankch@waru.life.nthu.edu.tw>
1998-08-15 12:51:49 +00:00
John Birrell
c61e516832 Add #ifndef __NETBSD_SYSCALLS around calls to issetugid() which
do not exist in NetBSD 1.3.
1998-01-15 09:58:08 +00:00
Andrey A. Chernov
bed2de7d4c Move MSKanji under XPG4 define 1997-09-25 23:18:10 +00:00
Julian Elischer
16f76e6f06 Submitted by: Sin'ichiro MIYATANI / Phase One, Inc <siu@phaseone.co.jp>
Basic support for the Shift JIS encoding of japanese.
(and one tiny typo fixed in a comment)
1997-09-24 20:38:12 +00:00
Andrey A. Chernov
21b4da0751 Restore PATH_LOCALE functionality using issetugid() call now 1997-04-07 08:54:38 +00:00
Andrey A. Chernov
63407d3487 Use symbolic constants instead of hardcoded digits
Add range check for setrunelocale since it can be called
directly.
Remove _startup_setlocale compatibility function

Should go into 2.2
1997-02-06 09:11:06 +00:00
Andrey A. Chernov
d81a091605 Update the comment why range checking not needed
Fix setrunelocale fail if called directly without prior setlocale
call

Should go in 2.2
1997-02-05 19:17:10 +00:00
Andrey A. Chernov
af155bdff3 Add comment that range checking is already done at upper level
Kill snprintf left in collate.c from previous backout

Should go in 2.2
1996-12-28 05:04:24 +00:00
Joerg Wunsch
42ceaa809f Back out rev 1.5: the overflow condition is already handled elsewhere. 1996-12-22 15:48:06 +00:00
Joerg Wunsch
120e62ec50 Fix yet another buffer overflow. :-(
Vulnerable: all programs that use setlocale(LC_COLLATE),
setlocale(LC_CTYPE), or setlocale(LC_ALL).  The only setuid/setgid
binary i've found for this is w(1).

Should go into 2.2.
1996-12-16 17:32:58 +00:00
Andrey A. Chernov
c8f931a80e PATH_LOCALE: use this non-standard env variable first time only, i.e.
strdup() it to prevent unsetenv() or setenv() effects. Check its length to
not allow user to overflow internal locale buffer. Move PATH_LOCALE
handling code into one place.

POSIX: make better stub for LC_MONETARY & LC_NUMERIC, now it check
locale directory existance instead of refusing all non-C non-POSIX
locales. POSIX treats empty locale env variable as unset variable
while our old code treats it as "C" locale, fix it. Implement previous locale
restoring, if locale setting fails. Old code assumes success if some
of LC_ALL subset is successed even other fails, POSIX treats it as
failure with previous locale restoring, fix it.

Remove unneccessary length checking in currentlocale()
1996-11-26 02:49:53 +00:00
Andrey A. Chernov
f9cbdf410e Move more stuff out to XPG4
Handle negative chars inside runetype/tolower/toupper
1995-11-03 08:59:02 +00:00
Andrey A. Chernov
34d08e870e Treat empty encoding as "C" encoding 1995-10-23 20:20:11 +00:00