freebsd-dev

Author	SHA1	Message	Date
Andrey A. Chernov	4932c895e7	Add comment explaining __mb_sb_limit trick here.	2007-10-15 09:51:30 +00:00
Andrey A. Chernov	367ed4e13d	The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries. Reviewied by: i18n@	2007-10-13 16:28:22 +00:00
Tom Rhodes	639dab2286	Fix a bug where, for 6-byte sequences, the top 6 bits get compared to 111111 rather than the top 7 bits being compared against 1111110 causing illegal bytes fe and ff being treated the same as legal bytes fc and fd.	2006-03-30 09:04:12 +00:00
Alexey Zelkin	e94c6cb4a2	. Static'ize functions exported via function reference variables only. . Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where appropriate. . move __init() prototypes to mblocal.h, and remove these prototypes from .c files . use _none_init() in __setrunelocale() instead of duplicating code . move __mb variables from table.c to none.c allowing us to not to export _none_*() externs, and appropriately remove them from mblocal.h Ok'ed by: tjr	2005-02-27 15:11:09 +00:00
Stefan Farfeleder	610b5a1fb1	Fix comparisons that test if an unsigned value is < 0. Reviewed by: tjr	2005-02-12 08:45:12 +00:00
Tim J. Robbins	ea9a9a377b	Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs(). These convert plain ASCII characters in-line, making them only slightly slower than the single-byte ("NONE" encoding) version when processing ASCII strings.	2004-07-27 06:29:48 +00:00
Tim J. Robbins	550473de5b	Add fast paths for conversion of plain ASCII characters.	2004-07-09 15:46:06 +00:00
Tim J. Robbins	5e44d7ebe1	Use conversion state objects to store the accumulated wide character, low bound, and the number of bytes remaining instead of storing the raw byte sequence and deriving them every time mbrtowc() is called. This is much faster -- about twice as fast in some crude benchmarks.	2004-05-17 12:32:40 +00:00
Tim J. Robbins	2051a8f2d5	Move prototypes of various encoding-related functions into a new header file to avoid extern'ing them all over the place.	2004-05-12 14:09:04 +00:00
Tim J. Robbins	fc813796d2	Perform some basic validation of multibyte conversion state objects.	2004-04-12 13:09:18 +00:00
Tim J. Robbins	fa02ee78c8	Don't cast away const qualifiers. Spotted by: bde	2004-04-10 00:27:52 +00:00
Tim J. Robbins	ca2dae426e	Allow partial multibyte characters to accumulate in conversion state objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required by C99.	2004-04-07 10:48:19 +00:00
Tim J. Robbins	b1c572ad5b	Fix a typo that caused mbrtowc() to always return 0.	2003-11-11 07:25:05 +00:00
Tim J. Robbins	02f4f60ad5	Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left unconverted; GB18030 will be done eventually, but GBK and UTF2 may just be removed, as they are subsets of GB18030 and UTF-8 respectively.	2003-11-02 10:09:33 +00:00
Jacques Vidrine	6d7bd75a4e	Whack 28 unused variables.	2003-02-18 13:39:52 +00:00
Tim J. Robbins	972baa3747	Add a UTF-8 encoding method, which will eventually replace the antique "UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible for 16-bit characters, the new UTF-8 implementation is much more strict about rejecting malformed input and also handles the full 31 bit range of characters.	2002-10-10 22:56:18 +00:00

16 Commits