freebsd-dev

Author	SHA1	Message	Date
Andrey A. Chernov	12eae8c8f3	1) Eliminate possibility to call __*collate_range_cmp() with inclomplete locale (which cause core dump) by removing whole 'table' argument by which it passed. 2) Restore __collate_range_cmp() in __sccl(). 3) Collating [a-z] range in regcomp() only for single bytes locales (we can't do it now for other ones). In previous state only first 256 wchars are considered and all others are just silently dropped from the range.	2016-07-14 09:07:25 +00:00
Andrey A. Chernov	1daad8f5ad	Back out non-collating [a-z] ranges. Instead of changing whole course to another POSIX-permitted way for consistency and uniformity I decide to completely ignore missing regex fucntionality and concentrace on fixing bugs in what we have now, too many small obstacles instead, counting ports.	2016-07-14 08:18:12 +00:00
Andrey A. Chernov	5a5807dd4c	Remove broken support for collation in [a-z] type ranges. Only first 256 wide chars are considered currently, all other are just dropped from the range. Proper implementation require reverse tables database lookup, since objects are really big as max UTF-8 (1114112 code points), so just the same scanning as it was for 256 chars will slow things down. POSIX does not require collation for [a-z] type ranges and does not prohibit it for non-POSIX locales. POSIX require collation for ranges only for POSIX (or C) locale which is equal to ASCII and binary for other chars, so we already have it. No other *BSD implements collation for [a-z] type ranges. Restore ABI compatibility with unused now __collate_range_cmp() which is visible from outside (will be removed later).	2016-07-10 03:49:38 +00:00
Baptiste Daroussin	a8bacd6f93	Fix a bad test resulting in a segfault with ISO-8859-5 locales Reported by: Lauri Tirkkonen from Illumos Approved by: re@ (gjb)	2016-07-03 15:00:12 +00:00
Pedro F. Giffuni	3c2c0c0443	libc/locale: Fix type breakage in __collate_range_cmp(). When collation support was brought in, the second and third arguments in __collate_range_cmp() were changed from int to wchar_t, breaking the ABI. Change them to a "char" type which makes more sense and keeps the ABI compatible. Also introduce __wcollate_range_cmp() which does work with wide characters. This function is used only internally in libc so we don't export it. Use the new function in glob(3), fnmatch(3), and regexec(3). PR: 179721 Suggested by: ache. jilles MFC after: 3 weeks (perhaps partial only)	2016-06-05 19:12:52 +00:00
Andrey A. Chernov	2f423a266a	For EILSEQ case in mbsnrtowcs() and wcsnrtombs() update src to point to the character after the one this conversion stopped at. PR: 209907 Submitted by: Roel Standaert <roel@abittechnical.com> (partially) MFC after: 3 days	2016-05-31 18:44:33 +00:00
Pedro F. Giffuni	32223c1b7d	libc: spelling fixes. Mostly on comments.	2016-04-30 01:24:24 +00:00
Baptiste Daroussin	49c4407313	Restore the original ascii.c from prior to r290494 It was doing the right thing, there was no need to "fail" to reinvent it from none.c Pointy hat: bapt Submitted by: ache	2016-04-21 07:36:11 +00:00
Baptiste Daroussin	d3591d68a9	Check the returned value of memchr(3) before using it Reported by: Coverity CID: 1338530	2016-04-20 20:44:30 +00:00
Pedro F. Giffuni	513004a23d	libc: replace 0 with NULL for pointers. While here also cleanup some surrounding code; particularly drop some malloc() casts. Found with devel/coccinelle. Reviewed by: bde (previous version - all new bugs are mine)	2016-04-10 19:33:58 +00:00
Andrey A. Chernov	ae7abb26b1	SJIS encoding don't have single byte characters >= 224 MFC after: 1 week	2016-04-04 15:56:14 +00:00
Andrey A. Chernov	e08c3b7c11	EUC-type encodings don't have single byte characters >= 128 This change should not be MFCed until new collate will be MFCed first, because our old EUC tables have some hacks for missing codesets.	2016-04-04 02:43:35 +00:00
Pedro F. Giffuni	45256214eb	mbtowc(3): set errno to EILSEQ if an incomplete character is passed. According to POSIX, The mbtowc() function shall fail if: [EILSEQ] An invalid character sequence is detected. Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D5496 Obtained from: OpenBSD (Ingo Schwarze) MFC after: 1 month	2016-03-01 19:15:34 +00:00
Enji Cooper	0b5cc81d3b	Link localeconv(3) to localeconv_l(3) MFC after: 3 days	2015-11-25 09:12:30 +00:00
Baptiste Daroussin	87101cb572	return "US-ASCII" instead of "POSIX" for "C" and "POSIX" locales as it used to be in previous version of the locales. Returning "POSIX" has too many fallouts.	2015-11-10 08:11:27 +00:00
Baptiste Daroussin	403105944d	nl_langinfo: Simplify case ladder The NONE:US-ASCII case isn't necessary. The "NONE:" case will handle US-ASCII, so let's remove the redundant handling. Submitted by: marino Obtained from: DragonflyBSD	2015-11-09 22:29:47 +00:00
Baptiste Daroussin	22b87a3555	Readd ascii.c forgotten in r290618	2015-11-09 22:11:37 +00:00
Baptiste Daroussin	473aa0b7ee	locales: Enforce US-ASCII encoding (limited to 7-bit) The US-ASCII format was getting treated identically to POSIX. It is supposed to throw an ILSEQ errno if a value of 0x80 or greater is encountered, so let's bring back the "ASCII" handling. While here, change nl_codeset to return US-ASCII only when the encoding really is "US-ASCII". Before "C" and "POSIX" encoding returned this string, so now they return "POSIX". Discussed with: ache Submitted by: marino Obtained from: DragonflyBSD	2015-11-09 22:06:22 +00:00
Baptiste Daroussin	e58504783b	Fix mbtowc not setting EILSEQ on an Incomplete multibyte sequence for eucJP encoding	2015-11-02 22:56:24 +00:00
Baptiste Daroussin	d8ed03efe5	locales: Fix eucJP sorting (broken upstream?) Sorting eucJP text with "sort" resulted in an illegal sequence while "gsort" worked. This was traced back to mbrtowc handling which was broken for eucJP (probably eucCN, eucKR, and eucTW as well). This small fix took hours to figure out. The OR operation to build the wide character requires an unsigned character to work correctly. The euc wcrtowc conversion is probably broken upstream in Illumos as well. Triggered by: misc/freebsd-doc-ja in ports (encoded in eucJP) Submitted by: marino Obtained from: DragonflyBSD	2015-11-01 21:02:30 +00:00
Baptiste Daroussin	d79cdd21de	libc: Fix (and improve) nl_langinfo (CODESET) The output of "locale charmap" is identical to the result of nl_langinfo (CODESET) for any given locale. The logic for returning the codeset was very simplistic. It just returned portion of the locale name after the period (e.g. en_FR.ISO8859-1 returned "ISO8859-1"). When softlinks were added to locales, this broke. e.g.: en_US returned "" en_FR.UTF8 returned "UTF8" en_FR.UTF-8 returned "UTF-8" zh_Hant_HK.Big5HKSCS returned "Big5HKSCS" zh_Hant_TW.Big5 returned "Big5" es_ES@euro returned "" In order to fix this properly, the named locale cannot be used to determine the encoding. This information was almost available in the rune data. Unfortunately, all the single byte encodings were listed as "NONE" encoding. So I adjusted localedef tool to provide more information about the encoding. For example, instead of "NONE", the LC_CTYPE used by fr_FR.ISO8859-15 is now encoded as "NONE:ISO8859-15". The locale handlers now check if the first four characters of the encoding is "NONE" and if so, treats it as a single-byte encoding. The nl_langinfo handling of CODESET was adjusting accordingly. Now the following is returned: en_US returns "ISO8859-1" fr_FR.UTF8 returns "UTF-8" fr_FR.UTF-8 returns "UTF-8" zh_Hant_HK.Big5HKSCS returns "Big5" zh_Hant_TW.Big5 returns "Big5" es_ES@euro returns "ISO8859-15" as before, "C" and "POSIX" locales return "US-ASCII". This is a big improvement. The result of nl_langinfo can never be a zero-length string and it will always exclusively one of the values of the character maps of /usr/src/tools/tools/locale/etc/final-maps. Submitted by: marino Obtained from: DragonflyBSD	2015-11-01 12:00:55 +00:00
Baptiste Daroussin	76e6db686e	collate: Fix expansion substitions (broken upstream too) Through testing, the user noted that some Cyrillic characters were not sorting correctly, and this was confirmed. After extensive testing and review, the localedef tool was eliminated as the culprit. The sustitutions were encoded correctly in LC_COLLATE. The error was mainly in wcscoll where character expansions were mishandled. The main directive pass routines had to be written to go back for a new collation value when the "state" variable was set. Before pointers were being advanced, the second lookup was gettting applied to the wrong character, etc. The "eat expansion codes" section on collate.c also had a bug. Later own, the "state" variable logic was changed to only set if next code was greater than zero (rather than >= 0). Some additional cleanups got captured from previous work: 1) The previous commit moved the binary search comment from the correct location to a wrong location because it's wrong upstream in Illumos. The comment has little value so I just removed it. 2) Don't check if pointers are null before freeing, this is redundant as free() handles null pointers. 3) The two binary search trees were standardized wrt initialization 4) On the binary search trees, a negative "high" exits rather than checking the table count again. Submitted by: marino Obtained from: DragonflyBSD	2015-10-23 23:24:03 +00:00
Baptiste Daroussin	332fe83717	libc/collate: minor tweaks / fix The main "fix" here is properly setting a collate loading error for each early return. Tweaks include removing unnecessary null checks, adding assertions (from Illumos) and a couple of variables to reduces code differences and improve readability. For normal use, there are no functional changes here. Obtained from: DragonflyBSD, Illumos	2015-10-22 14:29:19 +00:00
Baptiste Daroussin	c25f5140e9	Include sys/*.h earlier Reported by: kib	2015-10-14 12:46:05 +00:00
Baptiste Daroussin	f5dde0166d	Commit log from Dragonfly: FreeBSD extended ctypes to include numbers (e.g. isnumber()) but never actually implemented it. The isnumber() function was equivalent to the isdigit() function in every case. Now that DragonFly's ctype source files have number definitions, the number ctype can finally be implemented. It's given a new flag _CTYPE_N. The isalnum() and iswalnum() functions have been changed to use this flag rather than the _CTYPE_D digit flag. While isalnum(), isnumber(), and their wide equivalents now return different values in locale cases, the ishexnumber() and iswhexnumber() functions are unchanged. They are still aliases for isxdigit() and iswxdigit(). Also change ctype.h for isdigit and isxdigit to use sbistype like the other functions. Obtained from: dragonfly	2015-10-13 20:43:49 +00:00
Baptiste Daroussin	becbad1f6e	Merge from head	2015-10-13 19:44:36 +00:00
Craig Rodrigues	c83f3fc4b4	Use ANSI C prototypes. Eliminates -Wold-style-definition warnings.	2015-09-20 20:50:18 +00:00
Baptiste Daroussin	23a32822d2	Merge from HEAD	2015-08-25 20:14:50 +00:00
Ed Schouten	57c69b1478	Make UTF-8 parsing and generation more strict. - in mbrtowc() we need to disallow codepoints above 0x10ffff. - In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff. Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D3399	2015-08-25 09:16:09 +00:00
Baptiste Daroussin	eaa94ab419	Fix typo	2015-08-09 12:20:22 +00:00
Baptiste Daroussin	28a20bb3f5	Use more asprintf Plug memory leak introduced in previous asprintf addition	2015-08-09 12:13:30 +00:00
Baptiste Daroussin	b89704cee7	Use asprintf/free instead of snprintf	2015-08-09 11:50:50 +00:00
Baptiste Daroussin	5e4bbc69de	Remove useless variable	2015-08-09 11:47:01 +00:00
Baptiste Daroussin	81eb7d7e4b	Readd checking utf16 surrogates that are invalid in utf8	2015-08-09 10:36:25 +00:00
Baptiste Daroussin	a6d2922cbb	Mark __collate_load_tables_l as static Remove useless addition to Symbols.map	2015-08-09 10:24:24 +00:00
Baptiste Daroussin	536451f914	Fix typo Fix bad location for include Reported by: jilles	2015-08-09 00:21:59 +00:00
Baptiste Daroussin	764a768e16	Merge from HEAD	2015-08-09 00:15:17 +00:00
Baptiste Daroussin	8bb93485fb	Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2) Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles	2015-08-09 00:06:56 +00:00
Baptiste Daroussin	c9d24bcfd5	Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles	2015-08-08 23:59:15 +00:00
Baptiste Daroussin	d0a68f8d38	Fix typo	2015-08-08 23:17:10 +00:00
Baptiste Daroussin	7b2473410f	Revamp CTYPE support (from Illumos & Dragonfly) Obtained from: Dragonfly	2015-08-08 18:22:14 +00:00
Baptiste Daroussin	2a6abeebef	The collate functions within libc have been using version 1 and 1.2 of the packed LC_COLLATE binary formats. These were generated with the colldef tool, but the new LC_COLLATE files are going to be generated by the new localedef tool using CLDR POSIX files as input. The BSD-flavored version of localedef identifies the format as "BSD 1.0". Any LC_COLLATE file with a different version will simply not be loaded, and all LC* categories will get set to "C" (aka "POSIX") locale. This work is based off of Nexenta's contribution to Illumos. The integration with xlocale is John Marino's work for Dragonfly. The following commits will enable localedef tool, disable the colldef tool, add generated colldef directory, and finally remove colldef from base. The only difference with Dragonfly are: - a few fixes to build with clang - And identification of the flavor as "BSD 1.0" instead of "Dragonfly 4.4" Obtained from: Dragonfly	2015-08-07 23:41:26 +00:00
David Chisnall	710ec77e3a	__xlocale_C_ctype should not be const. It contains a reference count that is modified by newlocale / duplocale / freelocale. MFC after: 1 week	2015-04-24 10:21:20 +00:00
David Chisnall	58912ae7c3	Small changes to locale-related man pages. Fix a missing .h and change the recommended include for the POSIX2008 functions from xlocale.h to locale.h. Including xlocale.h is for legacy / Darwin compatibility so should not be encouraged.	2015-04-24 10:17:55 +00:00
Tijl Coosemans	1243a98e38	Remove the const qualifier from iconv(3) to comply with POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html Adjust all code that calls iconv. PR: 199099 Exp-run by: antoine MFC after: 2 weeks	2015-04-15 09:09:20 +00:00
Joel Dahl	4990a1c050	mdoc: improvements to SEE ALSO.	2014-12-27 08:31:52 +00:00
Pedro F. Giffuni	9908eab82e	libc/locale: Remove a wrong comma. This only had some effect when debugging. Obtained from: DragonflyBSD MFC after: 3 days	2014-09-04 17:36:21 +00:00
Pedro F. Giffuni	0716c0ff7a	minor perf enhancement for UTF-8 Reduce some duplicate code. Reference: https://www.illumos.org/issues/628 Obtained from: Illumos MFC after: 1 week	2014-07-04 22:39:39 +00:00
Pedro F. Giffuni	0f5132cd25	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-05-01 01:42:48 +00:00
Pedro F. Giffuni	97ecaa8907	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-04-29 15:25:57 +00:00

1 2 3 4 5 ...

603 Commits