freebsd-skq

Author	SHA1	Message	Date
pfg	aa4f79bd1b	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-04-29 15:25:57 +00:00
theraven	0f6ef690b3	Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a load of _l suffixed versions of various standard library functions that use the global locale, making them take an explicit locale parameter. Also adds support for per-thread locales. This work was funded by the FreeBSD Foundation. Please test any code you have that uses the C standard locale functions! Reviewed by: das (gdtoa changes) Approved by: dim (mentor)	2011-11-20 14:45:42 +00:00
ache	c44dc0d639	Add comment explaining __mb_sb_limit trick here.	2007-10-15 09:51:30 +00:00
ache	a5038f060d	The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries. Reviewied by: i18n@	2007-10-13 16:28:22 +00:00
trhodes	dba9e095d4	Fix a bug where, for 6-byte sequences, the top 6 bits get compared to 111111 rather than the top 7 bits being compared against 1111110 causing illegal bytes fe and ff being treated the same as legal bytes fc and fd.	2006-03-30 09:04:12 +00:00
phantom	23d961a13f	. Static'ize functions exported via function reference variables only. . Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where appropriate. . move __init() prototypes to mblocal.h, and remove these prototypes from .c files . use _none_init() in __setrunelocale() instead of duplicating code . move __mb variables from table.c to none.c allowing us to not to export _none_*() externs, and appropriately remove them from mblocal.h Ok'ed by: tjr	2005-02-27 15:11:09 +00:00
stefanf	d5719e89ef	Fix comparisons that test if an unsigned value is < 0. Reviewed by: tjr	2005-02-12 08:45:12 +00:00
tjr	b9fa8ef024	Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs(). These convert plain ASCII characters in-line, making them only slightly slower than the single-byte ("NONE" encoding) version when processing ASCII strings.	2004-07-27 06:29:48 +00:00
tjr	0bea5c0108	Add fast paths for conversion of plain ASCII characters.	2004-07-09 15:46:06 +00:00
tjr	aee8349a0c	Use conversion state objects to store the accumulated wide character, low bound, and the number of bytes remaining instead of storing the raw byte sequence and deriving them every time mbrtowc() is called. This is much faster -- about twice as fast in some crude benchmarks.	2004-05-17 12:32:40 +00:00
tjr	e3f042f4af	Move prototypes of various encoding-related functions into a new header file to avoid extern'ing them all over the place.	2004-05-12 14:09:04 +00:00
tjr	8f8a2ad179	Perform some basic validation of multibyte conversion state objects.	2004-04-12 13:09:18 +00:00
tjr	17077e5ae6	Don't cast away const qualifiers. Spotted by: bde	2004-04-10 00:27:52 +00:00
tjr	54a18fa1d6	Allow partial multibyte characters to accumulate in conversion state objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required by C99.	2004-04-07 10:48:19 +00:00
tjr	042225384f	Fix a typo that caused mbrtowc() to always return 0.	2003-11-11 07:25:05 +00:00
tjr	1c3a3f7e26	Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left unconverted; GB18030 will be done eventually, but GBK and UTF2 may just be removed, as they are subsets of GB18030 and UTF-8 respectively.	2003-11-02 10:09:33 +00:00
nectar	e369901c4d	Whack 28 unused variables.	2003-02-18 13:39:52 +00:00
tjr	d62abf19a3	Add a UTF-8 encoding method, which will eventually replace the antique "UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible for 16-bit characters, the new UTF-8 implementation is much more strict about rejecting malformed input and also handles the full 31 bit range of characters.	2002-10-10 22:56:18 +00:00

18 Commits