freebsd-dev

Author	SHA1	Message	Date
Pedro F. Giffuni	d915a14ef0	libc: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using mis-identified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-25 17:12:48 +00:00
Baptiste Daroussin	23a32822d2	Merge from HEAD	2015-08-25 20:14:50 +00:00
Ed Schouten	57c69b1478	Make UTF-8 parsing and generation more strict. - in mbrtowc() we need to disallow codepoints above 0x10ffff. - In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff. Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D3399	2015-08-25 09:16:09 +00:00
Baptiste Daroussin	81eb7d7e4b	Readd checking utf16 surrogates that are invalid in utf8	2015-08-09 10:36:25 +00:00
Baptiste Daroussin	764a768e16	Merge from HEAD	2015-08-09 00:15:17 +00:00
Baptiste Daroussin	8bb93485fb	Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2) Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles	2015-08-09 00:06:56 +00:00
Baptiste Daroussin	c9d24bcfd5	Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles	2015-08-08 23:59:15 +00:00
Baptiste Daroussin	7b2473410f	Revamp CTYPE support (from Illumos & Dragonfly) Obtained from: Dragonfly	2015-08-08 18:22:14 +00:00
Pedro F. Giffuni	0716c0ff7a	minor perf enhancement for UTF-8 Reduce some duplicate code. Reference: https://www.illumos.org/issues/628 Obtained from: Illumos MFC after: 1 week	2014-07-04 22:39:39 +00:00
Pedro F. Giffuni	0f5132cd25	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-05-01 01:42:48 +00:00
Pedro F. Giffuni	97ecaa8907	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-04-29 15:25:57 +00:00
David Chisnall	3c87aa1d3d	Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a load of _l suffixed versions of various standard library functions that use the global locale, making them take an explicit locale parameter. Also adds support for per-thread locales. This work was funded by the FreeBSD Foundation. Please test any code you have that uses the C standard locale functions! Reviewed by: das (gdtoa changes) Approved by: dim (mentor)	2011-11-20 14:45:42 +00:00
Andrey A. Chernov	4932c895e7	Add comment explaining __mb_sb_limit trick here.	2007-10-15 09:51:30 +00:00
Andrey A. Chernov	367ed4e13d	The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries. Reviewied by: i18n@	2007-10-13 16:28:22 +00:00
Tom Rhodes	639dab2286	Fix a bug where, for 6-byte sequences, the top 6 bits get compared to 111111 rather than the top 7 bits being compared against 1111110 causing illegal bytes fe and ff being treated the same as legal bytes fc and fd.	2006-03-30 09:04:12 +00:00
Alexey Zelkin	e94c6cb4a2	. Static'ize functions exported via function reference variables only. . Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where appropriate. . move __init() prototypes to mblocal.h, and remove these prototypes from .c files . use _none_init() in __setrunelocale() instead of duplicating code . move __mb variables from table.c to none.c allowing us to not to export _none_*() externs, and appropriately remove them from mblocal.h Ok'ed by: tjr	2005-02-27 15:11:09 +00:00
Stefan Farfeleder	610b5a1fb1	Fix comparisons that test if an unsigned value is < 0. Reviewed by: tjr	2005-02-12 08:45:12 +00:00
Tim J. Robbins	ea9a9a377b	Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs(). These convert plain ASCII characters in-line, making them only slightly slower than the single-byte ("NONE" encoding) version when processing ASCII strings.	2004-07-27 06:29:48 +00:00
Tim J. Robbins	550473de5b	Add fast paths for conversion of plain ASCII characters.	2004-07-09 15:46:06 +00:00
Tim J. Robbins	5e44d7ebe1	Use conversion state objects to store the accumulated wide character, low bound, and the number of bytes remaining instead of storing the raw byte sequence and deriving them every time mbrtowc() is called. This is much faster -- about twice as fast in some crude benchmarks.	2004-05-17 12:32:40 +00:00
Tim J. Robbins	2051a8f2d5	Move prototypes of various encoding-related functions into a new header file to avoid extern'ing them all over the place.	2004-05-12 14:09:04 +00:00
Tim J. Robbins	fc813796d2	Perform some basic validation of multibyte conversion state objects.	2004-04-12 13:09:18 +00:00
Tim J. Robbins	fa02ee78c8	Don't cast away const qualifiers. Spotted by: bde	2004-04-10 00:27:52 +00:00
Tim J. Robbins	ca2dae426e	Allow partial multibyte characters to accumulate in conversion state objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required by C99.	2004-04-07 10:48:19 +00:00
Tim J. Robbins	b1c572ad5b	Fix a typo that caused mbrtowc() to always return 0.	2003-11-11 07:25:05 +00:00
Tim J. Robbins	02f4f60ad5	Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left unconverted; GB18030 will be done eventually, but GBK and UTF2 may just be removed, as they are subsets of GB18030 and UTF-8 respectively.	2003-11-02 10:09:33 +00:00
Jacques Vidrine	6d7bd75a4e	Whack 28 unused variables.	2003-02-18 13:39:52 +00:00
Tim J. Robbins	972baa3747	Add a UTF-8 encoding method, which will eventually replace the antique "UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible for 16-bit characters, the new UTF-8 implementation is much more strict about rejecting malformed input and also handles the full 31 bit range of characters.	2002-10-10 22:56:18 +00:00

28 Commits