for wide characters locales in the argument range >= 0x80 - they may
return false positives.
Example 1: for UTF-8 locale we currently have:
iswspace(0xA0)==1 and isspace(0xA0)==1
(because iswspace() and isspace() are the same code)
but must have
iswspace(0xA0)==1 and isspace(0xA0)==0
(because there is no such character and all others in the range
0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte
range because our internal wchar_t representation for UTF-8 is UCS-4).
Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may
return false positives (must be 0).
(because iswalpha() and isalpha() are the same code)
This change address this issue separating single byte and wide ctype
and also fix iswascii() (currently iswascii() is broken for
arguments > 0xFF).
This change is 100% binary compatible with old binaries.
Reviewied by: i18n@
. Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where
appropriate.
. move _*_init() prototypes to mblocal.h, and remove these prototypes
from .c files
. use _none_init() in __setrunelocale() instead of duplicating code
. move __mb* variables from table.c to none.c allowing us to not to
export _none_*() externs, and appropriately remove them from mblocal.h
Ok'ed by: tjr
convenient when the source string isn't null-terminated.
Implement the other conversion functions (mbstowcs(), mbsrtowcs(), wcstombs(),
wcsrtombs()) in terms of these new functions.
with ``__'' to avoid polluting the namespace. This doesn't change the
documented rune interface at all, but breaks applications that accessed
_RuneLocale directly.
as wrappers around the deprecated 4.4BSD rune functions. This paves the
way for state-dependent encodings, which the rune API does not support.
- Add __emulated_sgetrune() and __emulated_sputrune(), which are
implementations of sgetrune() and sputrune() in terms of
mbrtowc() and wcrtomb().
- Rename the old rune-wrapper mbrtowc() and wcrtomb() functions to
__emulated_mbrtowc() and __emulated_wcrtomb().
- Add __mbrtowc and __wcrtomb function pointers, which point to the
current locale's conversion functions, or the __emulated versions.
- Implement mbrtowc() and wcrtomb() as calls to these function pointers.
- Make the "NONE" encoding implement mbrtowc() and wcrtomb() directly.
All of this emulation mess will be removed, together with rune support,
in FreeBSD 6.
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible
for 16-bit characters, the new UTF-8 implementation is much more strict
about rejecting malformed input and also handles the full 31 bit range
of characters.
currently cached data. It allows a number of nice things, like: removing
fallback code from single locale loading, remove memory leak when LC_CTYPE
data loaded again and again, efficient cache use, not only for
setlocale(locale1); setlocale(locale1), but for setlocale(locale1);
setlocale("C"); setlocale(locale1) too (i.e. data file loaded only once).
the diff is attached below. This is done on the 3.0 source-tree.
I have test this on 2.2-stable before, but I don't have a 3.0 machine
right now.
This patch is mainly to make libc support BIG5 encoding, thus add
zh_TW.BIG5 locale to 3.0.
Submitted by: Chen Hsiung Chan <frankch@waru.life.nthu.edu.tw>
Vulnerable: all programs that use setlocale(LC_COLLATE),
setlocale(LC_CTYPE), or setlocale(LC_ALL). The only setuid/setgid
binary i've found for this is w(1).
Should go into 2.2.
strdup() it to prevent unsetenv() or setenv() effects. Check its length to
not allow user to overflow internal locale buffer. Move PATH_LOCALE
handling code into one place.
POSIX: make better stub for LC_MONETARY & LC_NUMERIC, now it check
locale directory existance instead of refusing all non-C non-POSIX
locales. POSIX treats empty locale env variable as unset variable
while our old code treats it as "C" locale, fix it. Implement previous locale
restoring, if locale setting fails. Old code assumes success if some
of LC_ALL subset is successed even other fails, POSIX treats it as
failure with previous locale restoring, fix it.
Remove unneccessary length checking in currentlocale()