freebsd-dev

Author	SHA1	Message	Date
Andrey A. Chernov	e08c3b7c11	EUC-type encodings don't have single byte characters >= 128 This change should not be MFCed until new collate will be MFCed first, because our old EUC tables have some hacks for missing codesets.	2016-04-04 02:43:35 +00:00
Baptiste Daroussin	e58504783b	Fix mbtowc not setting EILSEQ on an Incomplete multibyte sequence for eucJP encoding	2015-11-02 22:56:24 +00:00
Baptiste Daroussin	d8ed03efe5	locales: Fix eucJP sorting (broken upstream?) Sorting eucJP text with "sort" resulted in an illegal sequence while "gsort" worked. This was traced back to mbrtowc handling which was broken for eucJP (probably eucCN, eucKR, and eucTW as well). This small fix took hours to figure out. The OR operation to build the wide character requires an unsigned character to work correctly. The euc wcrtowc conversion is probably broken upstream in Illumos as well. Triggered by: misc/freebsd-doc-ja in ports (encoded in eucJP) Submitted by: marino Obtained from: DragonflyBSD	2015-11-01 21:02:30 +00:00
Baptiste Daroussin	7b2473410f	Revamp CTYPE support (from Illumos & Dragonfly) Obtained from: Dragonfly	2015-08-08 18:22:14 +00:00
David Chisnall	3c87aa1d3d	Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a load of _l suffixed versions of various standard library functions that use the global locale, making them take an explicit locale parameter. Also adds support for per-thread locales. This work was funded by the FreeBSD Foundation. Please test any code you have that uses the C standard locale functions! Reviewed by: das (gdtoa changes) Approved by: dim (mentor)	2011-11-20 14:45:42 +00:00
Andrey A. Chernov	367ed4e13d	The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries. Reviewied by: i18n@	2007-10-13 16:28:22 +00:00
Alexey Zelkin	e94c6cb4a2	. Static'ize functions exported via function reference variables only. . Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where appropriate. . move __init() prototypes to mblocal.h, and remove these prototypes from .c files . use _none_init() in __setrunelocale() instead of duplicating code . move __mb variables from table.c to none.c allowing us to not to export _none_*() externs, and appropriately remove them from mblocal.h Ok'ed by: tjr	2005-02-27 15:11:09 +00:00
Tim J. Robbins	ddc1eded85	Prefix the names of members of _RuneLocale and its sub-structures with ``__'' to avoid polluting the namespace. This doesn't change the documented rune interface at all, but breaks applications that accessed _RuneLocale directly.	2004-06-23 07:01:44 +00:00
Tim J. Robbins	c05bd9ae25	Buffer partial wide characters more efficiently: instead of storing the multibyte representation in conversion state objects, store the accumulated wide character, set number and number of bytes remaining to avoid having to derive them every time mbrtowc() is called.	2004-05-27 10:54:34 +00:00
Tim J. Robbins	2051a8f2d5	Move prototypes of various encoding-related functions into a new header file to avoid extern'ing them all over the place.	2004-05-12 14:09:04 +00:00
Tim J. Robbins	88af941a73	In the absence of proper validation, at least check that null bytes do not appear as anything but the first byte of a multibyte character.	2004-05-11 14:08:22 +00:00
Tim J. Robbins	fc813796d2	Perform some basic validation of multibyte conversion state objects.	2004-04-12 13:09:18 +00:00
Tim J. Robbins	fa02ee78c8	Don't cast away const qualifiers. Spotted by: bde	2004-04-10 00:27:52 +00:00
Tim J. Robbins	ca2dae426e	Allow partial multibyte characters to accumulate in conversion state objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required by C99.	2004-04-07 10:48:19 +00:00
Tim J. Robbins	9e0bd333f0	Remove unused #includes.	2003-11-08 02:58:37 +00:00
Tim J. Robbins	02f4f60ad5	Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implement mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left unconverted; GB18030 will be done eventually, but GBK and UTF2 may just be removed, as they are subsets of GB18030 and UTF-8 respectively.	2003-11-02 10:09:33 +00:00
Andrey A. Chernov	ec5ca2eba7	Add safeguards to never use errno == 0 as setrunelocale() error return code	2002-08-09 08:22:29 +00:00
Andrey A. Chernov	76692b8025	Rewrite locale loading procedures, so any load failure will not affect currently cached data. It allows a number of nice things, like: removing fallback code from single locale loading, remove memory leak when LC_CTYPE data loaded again and again, efficient cache use, not only for setlocale(locale1); setlocale(locale1), but for setlocale(locale1); setlocale("C"); setlocale(locale1) too (i.e. data file loaded only once).	2002-08-08 05:51:54 +00:00
Andrey A. Chernov	45206d5c69	Fix wrong address when EucInfo > "variable" size	2002-08-07 20:20:56 +00:00
Jeroen Ruigrok van der Werven	eb12e52a25	Remove the hard-coded limit of 3 bytes for EUC encodings. Satoshi NIIMI-san kindly explained that EUC does not limit the byte length to any arbitrary number. We now set the limit to the maximum octet length of the codeset and it is locale-specific. Submitted by: Yong-Jhen Hong <winard@ms11.url.com.tw>	2002-04-14 10:55:42 +00:00
Jeroen Ruigrok van der Werven	a243e676fe	Fix EUC encoding conversion for codeset 3 and 4 to comply to the specification. PR: 28552 Submitted by: NIIMI Satoshi <sa2c@and.or.jp>	2002-04-07 16:37:15 +00:00
David E. O'Brien	333fc21e3c	Fix the style of the SCM ID's. I believe have made all of libc .c's as consistent as possible.	2002-03-22 21:53:29 +00:00
David E. O'Brien	c05ac53b8b	Remove __P() usage.	2002-03-21 22:49:10 +00:00
Andrey A. Chernov	8b96e6c916	Megre XPG4 code into libc	2000-06-03 12:24:08 +00:00
John Birrell	da8a9b61c7	Include string.h for memcpy function prototype.	1998-01-14 08:14:56 +00:00
Andrey A. Chernov	350a3d3e48	Migrate from XPG4 to XPG3 (libxpg4 will be added soon) Remove big part of my startup_setlocale hack. Add missing manpage links.	1995-10-23 01:34:17 +00:00
Rodney W. Grimes	58f0484fa2	BSD 4.4 Lite Lib Sources	1994-05-27 05:00:24 +00:00

27 Commits