freebsd-dev

Author	SHA1	Message	Date
Andrey A. Chernov	e08c3b7c11	EUC-type encodings don't have single byte characters >= 128 This change should not be MFCed until new collate will be MFCed first, because our old EUC tables have some hacks for missing codesets.	2016-04-04 02:43:35 +00:00
Pedro F. Giffuni	45256214eb	mbtowc(3): set errno to EILSEQ if an incomplete character is passed. According to POSIX, The mbtowc() function shall fail if: [EILSEQ] An invalid character sequence is detected. Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D5496 Obtained from: OpenBSD (Ingo Schwarze) MFC after: 1 month	2016-03-01 19:15:34 +00:00
Enji Cooper	0b5cc81d3b	Link localeconv(3) to localeconv_l(3) MFC after: 3 days	2015-11-25 09:12:30 +00:00
Baptiste Daroussin	87101cb572	return "US-ASCII" instead of "POSIX" for "C" and "POSIX" locales as it used to be in previous version of the locales. Returning "POSIX" has too many fallouts.	2015-11-10 08:11:27 +00:00
Baptiste Daroussin	403105944d	nl_langinfo: Simplify case ladder The NONE:US-ASCII case isn't necessary. The "NONE:" case will handle US-ASCII, so let's remove the redundant handling. Submitted by: marino Obtained from: DragonflyBSD	2015-11-09 22:29:47 +00:00
Baptiste Daroussin	22b87a3555	Readd ascii.c forgotten in r290618	2015-11-09 22:11:37 +00:00
Baptiste Daroussin	473aa0b7ee	locales: Enforce US-ASCII encoding (limited to 7-bit) The US-ASCII format was getting treated identically to POSIX. It is supposed to throw an ILSEQ errno if a value of 0x80 or greater is encountered, so let's bring back the "ASCII" handling. While here, change nl_codeset to return US-ASCII only when the encoding really is "US-ASCII". Before "C" and "POSIX" encoding returned this string, so now they return "POSIX". Discussed with: ache Submitted by: marino Obtained from: DragonflyBSD	2015-11-09 22:06:22 +00:00
Baptiste Daroussin	e58504783b	Fix mbtowc not setting EILSEQ on an Incomplete multibyte sequence for eucJP encoding	2015-11-02 22:56:24 +00:00
Baptiste Daroussin	d8ed03efe5	locales: Fix eucJP sorting (broken upstream?) Sorting eucJP text with "sort" resulted in an illegal sequence while "gsort" worked. This was traced back to mbrtowc handling which was broken for eucJP (probably eucCN, eucKR, and eucTW as well). This small fix took hours to figure out. The OR operation to build the wide character requires an unsigned character to work correctly. The euc wcrtowc conversion is probably broken upstream in Illumos as well. Triggered by: misc/freebsd-doc-ja in ports (encoded in eucJP) Submitted by: marino Obtained from: DragonflyBSD	2015-11-01 21:02:30 +00:00
Baptiste Daroussin	d79cdd21de	libc: Fix (and improve) nl_langinfo (CODESET) The output of "locale charmap" is identical to the result of nl_langinfo (CODESET) for any given locale. The logic for returning the codeset was very simplistic. It just returned portion of the locale name after the period (e.g. en_FR.ISO8859-1 returned "ISO8859-1"). When softlinks were added to locales, this broke. e.g.: en_US returned "" en_FR.UTF8 returned "UTF8" en_FR.UTF-8 returned "UTF-8" zh_Hant_HK.Big5HKSCS returned "Big5HKSCS" zh_Hant_TW.Big5 returned "Big5" es_ES@euro returned "" In order to fix this properly, the named locale cannot be used to determine the encoding. This information was almost available in the rune data. Unfortunately, all the single byte encodings were listed as "NONE" encoding. So I adjusted localedef tool to provide more information about the encoding. For example, instead of "NONE", the LC_CTYPE used by fr_FR.ISO8859-15 is now encoded as "NONE:ISO8859-15". The locale handlers now check if the first four characters of the encoding is "NONE" and if so, treats it as a single-byte encoding. The nl_langinfo handling of CODESET was adjusting accordingly. Now the following is returned: en_US returns "ISO8859-1" fr_FR.UTF8 returns "UTF-8" fr_FR.UTF-8 returns "UTF-8" zh_Hant_HK.Big5HKSCS returns "Big5" zh_Hant_TW.Big5 returns "Big5" es_ES@euro returns "ISO8859-15" as before, "C" and "POSIX" locales return "US-ASCII". This is a big improvement. The result of nl_langinfo can never be a zero-length string and it will always exclusively one of the values of the character maps of /usr/src/tools/tools/locale/etc/final-maps. Submitted by: marino Obtained from: DragonflyBSD	2015-11-01 12:00:55 +00:00
Baptiste Daroussin	76e6db686e	collate: Fix expansion substitions (broken upstream too) Through testing, the user noted that some Cyrillic characters were not sorting correctly, and this was confirmed. After extensive testing and review, the localedef tool was eliminated as the culprit. The sustitutions were encoded correctly in LC_COLLATE. The error was mainly in wcscoll where character expansions were mishandled. The main directive pass routines had to be written to go back for a new collation value when the "state" variable was set. Before pointers were being advanced, the second lookup was gettting applied to the wrong character, etc. The "eat expansion codes" section on collate.c also had a bug. Later own, the "state" variable logic was changed to only set if next code was greater than zero (rather than >= 0). Some additional cleanups got captured from previous work: 1) The previous commit moved the binary search comment from the correct location to a wrong location because it's wrong upstream in Illumos. The comment has little value so I just removed it. 2) Don't check if pointers are null before freeing, this is redundant as free() handles null pointers. 3) The two binary search trees were standardized wrt initialization 4) On the binary search trees, a negative "high" exits rather than checking the table count again. Submitted by: marino Obtained from: DragonflyBSD	2015-10-23 23:24:03 +00:00
Baptiste Daroussin	332fe83717	libc/collate: minor tweaks / fix The main "fix" here is properly setting a collate loading error for each early return. Tweaks include removing unnecessary null checks, adding assertions (from Illumos) and a couple of variables to reduces code differences and improve readability. For normal use, there are no functional changes here. Obtained from: DragonflyBSD, Illumos	2015-10-22 14:29:19 +00:00
Baptiste Daroussin	c25f5140e9	Include sys/*.h earlier Reported by: kib	2015-10-14 12:46:05 +00:00
Baptiste Daroussin	f5dde0166d	Commit log from Dragonfly: FreeBSD extended ctypes to include numbers (e.g. isnumber()) but never actually implemented it. The isnumber() function was equivalent to the isdigit() function in every case. Now that DragonFly's ctype source files have number definitions, the number ctype can finally be implemented. It's given a new flag _CTYPE_N. The isalnum() and iswalnum() functions have been changed to use this flag rather than the _CTYPE_D digit flag. While isalnum(), isnumber(), and their wide equivalents now return different values in locale cases, the ishexnumber() and iswhexnumber() functions are unchanged. They are still aliases for isxdigit() and iswxdigit(). Also change ctype.h for isdigit and isxdigit to use sbistype like the other functions. Obtained from: dragonfly	2015-10-13 20:43:49 +00:00
Baptiste Daroussin	becbad1f6e	Merge from head	2015-10-13 19:44:36 +00:00
Craig Rodrigues	c83f3fc4b4	Use ANSI C prototypes. Eliminates -Wold-style-definition warnings.	2015-09-20 20:50:18 +00:00
Baptiste Daroussin	23a32822d2	Merge from HEAD	2015-08-25 20:14:50 +00:00
Ed Schouten	57c69b1478	Make UTF-8 parsing and generation more strict. - in mbrtowc() we need to disallow codepoints above 0x10ffff. - In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff. Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D3399	2015-08-25 09:16:09 +00:00
Baptiste Daroussin	eaa94ab419	Fix typo	2015-08-09 12:20:22 +00:00
Baptiste Daroussin	28a20bb3f5	Use more asprintf Plug memory leak introduced in previous asprintf addition	2015-08-09 12:13:30 +00:00
Baptiste Daroussin	b89704cee7	Use asprintf/free instead of snprintf	2015-08-09 11:50:50 +00:00
Baptiste Daroussin	5e4bbc69de	Remove useless variable	2015-08-09 11:47:01 +00:00
Baptiste Daroussin	81eb7d7e4b	Readd checking utf16 surrogates that are invalid in utf8	2015-08-09 10:36:25 +00:00
Baptiste Daroussin	a6d2922cbb	Mark __collate_load_tables_l as static Remove useless addition to Symbols.map	2015-08-09 10:24:24 +00:00
Baptiste Daroussin	536451f914	Fix typo Fix bad location for include Reported by: jilles	2015-08-09 00:21:59 +00:00
Baptiste Daroussin	764a768e16	Merge from HEAD	2015-08-09 00:15:17 +00:00
Baptiste Daroussin	8bb93485fb	Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2) Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles	2015-08-09 00:06:56 +00:00
Baptiste Daroussin	c9d24bcfd5	Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles	2015-08-08 23:59:15 +00:00
Baptiste Daroussin	d0a68f8d38	Fix typo	2015-08-08 23:17:10 +00:00
Baptiste Daroussin	7b2473410f	Revamp CTYPE support (from Illumos & Dragonfly) Obtained from: Dragonfly	2015-08-08 18:22:14 +00:00
Baptiste Daroussin	2a6abeebef	The collate functions within libc have been using version 1 and 1.2 of the packed LC_COLLATE binary formats. These were generated with the colldef tool, but the new LC_COLLATE files are going to be generated by the new localedef tool using CLDR POSIX files as input. The BSD-flavored version of localedef identifies the format as "BSD 1.0". Any LC_COLLATE file with a different version will simply not be loaded, and all LC* categories will get set to "C" (aka "POSIX") locale. This work is based off of Nexenta's contribution to Illumos. The integration with xlocale is John Marino's work for Dragonfly. The following commits will enable localedef tool, disable the colldef tool, add generated colldef directory, and finally remove colldef from base. The only difference with Dragonfly are: - a few fixes to build with clang - And identification of the flavor as "BSD 1.0" instead of "Dragonfly 4.4" Obtained from: Dragonfly	2015-08-07 23:41:26 +00:00
David Chisnall	710ec77e3a	__xlocale_C_ctype should not be const. It contains a reference count that is modified by newlocale / duplocale / freelocale. MFC after: 1 week	2015-04-24 10:21:20 +00:00
David Chisnall	58912ae7c3	Small changes to locale-related man pages. Fix a missing .h and change the recommended include for the POSIX2008 functions from xlocale.h to locale.h. Including xlocale.h is for legacy / Darwin compatibility so should not be encouraged.	2015-04-24 10:17:55 +00:00
Tijl Coosemans	1243a98e38	Remove the const qualifier from iconv(3) to comply with POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html Adjust all code that calls iconv. PR: 199099 Exp-run by: antoine MFC after: 2 weeks	2015-04-15 09:09:20 +00:00
Joel Dahl	4990a1c050	mdoc: improvements to SEE ALSO.	2014-12-27 08:31:52 +00:00
Pedro F. Giffuni	9908eab82e	libc/locale: Remove a wrong comma. This only had some effect when debugging. Obtained from: DragonflyBSD MFC after: 3 days	2014-09-04 17:36:21 +00:00
Pedro F. Giffuni	0716c0ff7a	minor perf enhancement for UTF-8 Reduce some duplicate code. Reference: https://www.illumos.org/issues/628 Obtained from: Illumos MFC after: 1 week	2014-07-04 22:39:39 +00:00
Pedro F. Giffuni	0f5132cd25	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-05-01 01:42:48 +00:00
Pedro F. Giffuni	97ecaa8907	citrus: Avoid invalid code points. From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days	2014-04-29 15:25:57 +00:00
David Chisnall	8d07b7deff	Fix an issue where the locale and rune locale could become out of sync, causing mb* functions (and similar) to be called with the wrong data (possibly a null pointer, causing a crash). PR: standards/188036 MFC after: 1 week	2014-04-02 11:10:46 +00:00
Marcel Moolenaar	8876613dc5	Replace use of ${.CURDIR} by ${LIBC_SRCTOP} and define ${LIBC_SRCTOP} if not already defined. This allows building libc from outside of lib/libc using a reach-over makefile. A typical use-case is to build a standard ILP32 version and a COMPAT32 version in a single iteration by building the COMPAT32 version using a reach-over makefile. Obtained from: Juniper Networks, Inc.	2014-03-04 02:19:39 +00:00
Peter Wemm	19dfd82d81	Replace the #define for "iconv" so it is for the function name instead of a macro with parameters. Remove a __DECONST hack and add consts instead for gnu libiconv API compatability. This makes it work with things like devel/boost-libs that expects to use "iconv" as though it were a pointer.	2013-07-03 07:03:19 +00:00
Ed Schouten	49111f0092	Add libiconv based versions of c16() and c32(). I initially thought wchar_t was locale independent, but this seems to be only the case on Linux. This means that we cannot depend on the wc() routines to implement c16() and c32(). Instead, use the Citrus libiconv that is part of libc. I'll see if there is anything I can do to make the existing functions somewhat useful in case the system is built without libiconv in the nearby future. If not, I'll simply remove the broken implementations. Reviewed by: jilles, gabor	2013-06-03 17:17:56 +00:00
Ed Schouten	50c77c6e8b	Add <uchar.h>. The <uchar.h> header, part of C11, adds a small number of utility functions for 16/32-bit "universal" characters, which may or may not be UTF-16/32. As our wchar_t is already ISO 10646, simply add light-weight wrappers around wcrtomb() and mbrtowc(). While there, also add (non-yet-standard) _l functions, similar to the ones we already have for the other locale-dependent functions. Reviewed by: theraven	2013-05-21 19:59:37 +00:00
Sergey Kandaurov	424b842a5d	Document that the return type is different from 1003.1-2008. MFC after: 1 week	2013-05-04 17:21:44 +00:00
Sergey Kandaurov	4f79ce7b4c	mdoc: missing comma in .Dd macro.	2013-05-04 17:06:47 +00:00
Sergey Kandaurov	67ff590b9e	Also, add a missing period.	2013-05-03 13:27:13 +00:00
Sergey Kandaurov	cc7a693fa7	Remove an extra comma.	2013-05-03 12:45:45 +00:00
Sergey Kandaurov	88bbbe88d7	Remove the STANDARDS section. querylocale is not part of IEEE Std 1003.1-2008. MFC after: 3 days	2013-05-03 12:42:43 +00:00
Jilles Tjoelker	86bca8fb51	btowc(3), isblank(3): Correct prototypes for _l variants. MFC after: 1 week	2013-03-27 21:31:40 +00:00

1 2 3 4 5 ...

592 Commits