Commit Graph

610 Commits

Author SHA1 Message Date
vangyzen
a20c52a118 Fix error reporting from wcstof()
When wcstof() skipped initial space and then parsing failed, it set
endptr to the first non-space character.  Fix it to correctly report
failure by setting endptr to the beginning of the input string.
The fix is from theraven@, who fixed this bug in wcstod() and
wcstold() in r227753.

While I'm here:

Move assignments out of declarations in wcstod() and wcstold().
This is against my personal preference, but it is our agreed style(9).

Set endptr correctly on malloc() failure in all three functions.

Remove an incorrect comment:  This is pointer arithmetic,
so the code was not actually making that assumption.

wcstold() advanced the wcp pointer beyond leading whitespace
and then reset it back to the beginning of the string.
Do not reset it.  This seems to have no functional effect,
since strtold_l() also skips leading whitespace.  I'm making
the change to keep this function consistent with wcstof() and
wcstod(), and because the C11 spec prescribes the use of iswspace()
to skip leading space.

Reported by:	libc++ unit test for std::stof(std::wstring)
MFC after:	8 days
Sponsored by:	Dell EMC
2016-11-20 20:13:22 +00:00
br
4caf0a1775 Locale fix for endian big (EB) machines.
We have locale files generated on EL machines (e.g. during cross-build
on amd64 host), but then we are using them on EB machines (e.g. MIPS64EB),
so proceed byte-swap if necessary.

All the libc tests passed successfully, including Russian collation.

Tested by:	br@, Hongyan Xia <hx242@cam.ac.uk>
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D8281
2016-11-01 13:54:44 +00:00
ed
6e39e4860b Change the return type of freelocale(3) to void.
Our version of this function currently returns an integer indicating
failure or success, whereas POSIX specifies that this function has no
return value. It returns void. Patch up the header, sources and man page
to use the right type. While there, use the opportunity to simplify the
body of this function.

Theoretically speaking, this change breaks the ABI of this function.
That said, I have yet to find any code that makes use of freelocale()'s
return value. I couldn't find any of it in the base system, nor did an
exp-run reveal any breakage caused by this change.

PR:		211394 (exp-run)
2016-07-29 17:18:47 +00:00
pfg
10d5e58554 libc: tag the Rune initialization function prototypes visibility as hidden.
It is good practice to export as few symbols as possible from your shared
libraries, so use the GCC visibility attribute in this case, matching what
Apple's libc does.

Reference:
https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/CppRuntimeEnv/Articles/SymbolVisibility.html

Hinted by:	Apple's libc 1082.20.4
MFC after:	1 week
2016-07-19 20:22:13 +00:00
bapt
6a5377a237 Revert 302324 and properly fix the crash with ISO-8859-5 locales
PR:		211135
Reported by:	jkim
Tested by:	jkim
MFC after:	2 days
2016-07-15 23:03:20 +00:00
ache
c80f41274a 1) Eliminate possibility to call __*collate_range_cmp() with inclomplete
locale (which cause core dump) by removing whole 'table' argument
by which it passed.

2) Restore __collate_range_cmp() in __sccl().

3) Collating [a-z] range in regcomp() only for single bytes locales
(we can't do it now for other ones). In previous state only first 256
wchars are considered and all others are just silently dropped from the
range.
2016-07-14 09:07:25 +00:00
ache
34a9f4379b Back out non-collating [a-z] ranges.
Instead of changing whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and concentrace on fixing bugs in what we have now,
too many small obstacles instead, counting ports.
2016-07-14 08:18:12 +00:00
ache
ea21df9888 Remove broken support for collation in [a-z] type ranges.
Only first 256 wide chars are considered currently, all other are just
dropped from the range. Proper implementation require reverse tables
database lookup, since objects are really big as max UTF-8 (1114112
code points), so just the same scanning as it was for 256 chars will
slow things down.

POSIX does not require collation for [a-z] type ranges and does not
prohibit it for non-POSIX locales. POSIX require collation for ranges
only for POSIX (or C) locale which is equal to ASCII and binary for
other chars, so we already have it.

No other *BSD implements collation for [a-z] type ranges.

Restore ABI compatibility with unused now __collate_range_cmp() which
is visible from outside (will be removed later).
2016-07-10 03:49:38 +00:00
bapt
3a5de13497 Fix a bad test resulting in a segfault with ISO-8859-5 locales
Reported by:	Lauri Tirkkonen from Illumos
Approved by:	re@ (gjb)
2016-07-03 15:00:12 +00:00
pfg
0366f1b527 libc/locale: Fix type breakage in __collate_range_cmp().
When collation support was brought in, the second and third
arguments in __collate_range_cmp() were changed from int to
wchar_t, breaking the ABI. Change them to a "char" type which
makes more sense and keeps the ABI compatible.

Also introduce __wcollate_range_cmp() which does work with wide
characters. This function is used only internally in libc so
we don't export it. Use the new function in glob(3), fnmatch(3),
and regexec(3).

PR:		179721
Suggested by:	ache. jilles
MFC after:	3 weeks (perhaps partial only)
2016-06-05 19:12:52 +00:00
ache
0abc93ce5b For EILSEQ case in mbsnrtowcs() and wcsnrtombs() update src to point to
the character after the one this conversion stopped at.

PR:     209907
Submitted by: Roel Standaert <roel@abittechnical.com> (partially)
MFC after:      3 days
2016-05-31 18:44:33 +00:00
pfg
69669cbe99 libc: spelling fixes.
Mostly on comments.
2016-04-30 01:24:24 +00:00
bapt
eaf20188e5 Restore the original ascii.c from prior to r290494
It was doing the right thing, there was no need to "fail" to reinvent it from
none.c

Pointy hat:	bapt
Submitted by:	ache
2016-04-21 07:36:11 +00:00
bapt
2253af0f89 Check the returned value of memchr(3) before using it
Reported by:	Coverity
CID:		1338530
2016-04-20 20:44:30 +00:00
pfg
6e91d78151 libc: replace 0 with NULL for pointers.
While here also cleanup some surrounding code; particularly
drop some malloc() casts.

Found with devel/coccinelle.

Reviewed by:	bde (previous version - all new bugs are mine)
2016-04-10 19:33:58 +00:00
ache
b6c01ffbf5 SJIS encoding don't have single byte characters >= 224
MFC after:      1 week
2016-04-04 15:56:14 +00:00
ache
d3bad64f6d EUC-type encodings don't have single byte characters >= 128
This change should not be MFCed until new collate will be
MFCed first, because our old EUC tables have some hacks for
missing codesets.
2016-04-04 02:43:35 +00:00
pfg
0e774f6016 mbtowc(3): set errno to EILSEQ if an incomplete character is passed.
According to POSIX, The mbtowc() function shall fail if:
[EILSEQ] An invalid character sequence is detected.

Reviewed by:		bapt
Differential Revision:	https://reviews.freebsd.org/D5496

Obtained from:	OpenBSD (Ingo Schwarze)
MFC after:	1 month
2016-03-01 19:15:34 +00:00
ngie
8751135c08 Link localeconv(3) to localeconv_l(3)
MFC after: 3 days
2015-11-25 09:12:30 +00:00
bapt
883271972d return "US-ASCII" instead of "POSIX" for "C" and "POSIX" locales
as it used to be in previous version of the locales. Returning
"POSIX" has too many fallouts.
2015-11-10 08:11:27 +00:00
bapt
ea0a81f953 nl_langinfo: Simplify case ladder
The NONE:US-ASCII case isn't necessary.  The "NONE:" case will handle
US-ASCII, so let's remove the redundant handling.

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-09 22:29:47 +00:00
bapt
c93fdec4fc Readd ascii.c forgotten in r290618 2015-11-09 22:11:37 +00:00
bapt
439f8fc616 locales: Enforce US-ASCII encoding (limited to 7-bit)
The US-ASCII format was getting treated identically to POSIX.  It is
supposed to throw an ILSEQ errno if a value of 0x80 or greater is
encountered, so let's bring back the "ASCII" handling.

While here, change nl_codeset to return US-ASCII only when the encoding
really is "US-ASCII".  Before "C" and "POSIX" encoding returned this
string, so now they return "POSIX".

Discussed with:	ache
Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-09 22:06:22 +00:00
bapt
7edeed0f0f Fix mbtowc not setting EILSEQ on an Incomplete multibyte sequence for eucJP encoding 2015-11-02 22:56:24 +00:00
bapt
e29f1d2f89 locales: Fix eucJP sorting (broken upstream?)
Sorting eucJP text with "sort" resulted in an illegal sequence while
"gsort" worked.  This was traced back to mbrtowc handling which was
broken for eucJP (probably eucCN, eucKR, and eucTW as well).  This
small fix took hours to figure out.  The OR operation to build the
wide character requires an unsigned character to work correctly.  The
euc wcrtowc conversion is probably broken upstream in Illumos as well.

Triggered by: misc/freebsd-doc-ja in ports (encoded in eucJP)

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-01 21:02:30 +00:00
bapt
8abd49c1b5 libc: Fix (and improve) nl_langinfo (CODESET)
The output of "locale charmap" is identical to the result of
nl_langinfo (CODESET) for any given locale.  The logic for returning the
codeset was very simplistic.  It just returned portion of the locale name
after the period (e.g. en_FR.ISO8859-1 returned "ISO8859-1").

When softlinks were added to locales, this broke.  e.g.:
   en_US returned ""
   en_FR.UTF8 returned "UTF8"
   en_FR.UTF-8 returned "UTF-8"
   zh_Hant_HK.Big5HKSCS returned "Big5HKSCS"
   zh_Hant_TW.Big5 returned "Big5"
   es_ES@euro returned ""

In order to fix this properly, the named locale cannot be used to
determine the encoding.  This information was almost available in the
rune data.  Unfortunately, all the single byte encodings were listed
as "NONE" encoding.

So I adjusted localedef tool to provide more information about the
encoding.  For example, instead of "NONE", the LC_CTYPE used by
fr_FR.ISO8859-15 is now encoded as "NONE:ISO8859-15".  The locale
handlers now check if the first four characters of the encoding is
"NONE" and if so, treats it as a single-byte encoding.

The nl_langinfo handling of CODESET was adjusting accordingly.  Now the
following is returned:
   en_US returns "ISO8859-1"
   fr_FR.UTF8 returns "UTF-8"
   fr_FR.UTF-8 returns "UTF-8"
   zh_Hant_HK.Big5HKSCS returns "Big5"
   zh_Hant_TW.Big5 returns "Big5"
   es_ES@euro returns "ISO8859-15"

as before, "C" and "POSIX" locales return "US-ASCII".  This is a big
improvement.  The result of nl_langinfo can never be a zero-length
string and it will always exclusively one of the values of the
character maps of /usr/src/tools/tools/locale/etc/final-maps.

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-01 12:00:55 +00:00
bapt
ffec40a708 collate: Fix expansion substitions (broken upstream too)
Through testing, the user noted that some Cyrillic characters were not
sorting correctly, and this was confirmed.

After extensive testing and review, the localedef tool was eliminated
as the culprit.  The sustitutions were encoded correctly in LC_COLLATE.

The error was mainly in wcscoll where character expansions were
mishandled.  The main directive pass routines had to be written to
go back for a new collation value when the "state" variable was set.
Before pointers were being advanced, the second lookup was gettting
applied to the wrong character, etc.

The "eat expansion codes" section on collate.c also had a bug.  Later
own, the "state" variable logic was changed to only set if next
code was greater than zero (rather than >= 0).

Some additional cleanups got captured from previous work:
1) The previous commit moved the binary search comment from the
   correct location to a wrong location because it's wrong upstream
   in Illumos.  The comment has little value so I just removed it.
2) Don't check if pointers are null before freeing, this is
   redundant as free() handles null pointers.
3) The two binary search trees were standardized wrt initialization
4) On the binary search trees, a negative "high" exits rather than
   checking the table count again.

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-10-23 23:24:03 +00:00
bapt
69d2b113da libc/collate: minor tweaks / fix
The main "fix" here is properly setting a collate loading error for each
early return.  Tweaks include removing unnecessary null checks, adding
assertions (from Illumos) and a couple of variables to reduces code
differences and improve readability.  For normal use, there are no
functional changes here.

Obtained from:	DragonflyBSD, Illumos
2015-10-22 14:29:19 +00:00
bapt
cd8d4b2207 Include sys/*.h earlier
Reported by:	kib
2015-10-14 12:46:05 +00:00
bapt
979669d923 Commit log from Dragonfly:
FreeBSD extended ctypes to include numbers (e.g. isnumber()) but never
actually implemented it.  The isnumber() function was equivalent to the
isdigit() function in every case.

Now that DragonFly's ctype source files have number definitions, the
number ctype can finally be implemented.  It's given a new flag _CTYPE_N.
The isalnum() and iswalnum() functions have been changed to use this
flag rather than the _CTYPE_D digit flag.

While isalnum(), isnumber(), and their wide equivalents now return
different values in locale cases, the ishexnumber() and iswhexnumber()
functions are unchanged.  They are still aliases for isxdigit() and
iswxdigit().

Also change ctype.h for isdigit and isxdigit to use sbistype like the
other functions.

Obtained from:	dragonfly
2015-10-13 20:43:49 +00:00
bapt
c8d6d4a785 Merge from head 2015-10-13 19:44:36 +00:00
rodrigc
aa990038b1 Use ANSI C prototypes. Eliminates -Wold-style-definition warnings. 2015-09-20 20:50:18 +00:00
bapt
2a77c3b71d Merge from HEAD 2015-08-25 20:14:50 +00:00
ed
94d0b79c32 Make UTF-8 parsing and generation more strict.
- in mbrtowc() we need to disallow codepoints above 0x10ffff.
- In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff.

Reviewed by:	bapt
Differential Revision:	https://reviews.freebsd.org/D3399
2015-08-25 09:16:09 +00:00
bapt
1682cb1283 Fix typo 2015-08-09 12:20:22 +00:00
bapt
4e0791ae19 Use more asprintf
Plug memory leak introduced in previous asprintf addition
2015-08-09 12:13:30 +00:00
bapt
572a3c0af5 Use asprintf/free instead of snprintf 2015-08-09 11:50:50 +00:00
bapt
b0838ae3bd Remove useless variable 2015-08-09 11:47:01 +00:00
bapt
4f14f063cf Readd checking utf16 surrogates that are invalid in utf8 2015-08-09 10:36:25 +00:00
bapt
32118b1361 Mark __collate_load_tables_l as static
Remove useless addition to Symbols.map
2015-08-09 10:24:24 +00:00
bapt
9e9acbb2b8 Fix typo
Fix bad location for include

Reported by:	jilles
2015-08-09 00:21:59 +00:00
bapt
2c81d16589 Merge from HEAD 2015-08-09 00:15:17 +00:00
bapt
9e0d1c3e19 Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2)
Per rfc3629 value greater than 0x10ffff should be rejected

Suggested by:	jilles
2015-08-09 00:06:56 +00:00
bapt
df871b442a Remove 5 and 6 bytes sequences which are illegal in UTF-8 space.
Per rfc3629 value greater than 0x10ffff should be rejected

Suggested by:	jilles
2015-08-08 23:59:15 +00:00
bapt
9dc264a4d3 Fix typo 2015-08-08 23:17:10 +00:00
bapt
0de7690917 Revamp CTYPE support (from Illumos & Dragonfly)
Obtained from:	Dragonfly
2015-08-08 18:22:14 +00:00
bapt
11a5726cda The collate functions within libc have been using version 1 and 1.2 of the
packed LC_COLLATE binary formats. These were generated with the colldef
tool, but the new LC_COLLATE files are going to be generated by the new
localedef tool using CLDR POSIX files as input.  The BSD-flavored
version of localedef identifies the format as "BSD 1.0".  Any
LC_COLLATE file with a different version will simply not be loaded, and
all LC* categories will get set to "C" (aka "POSIX") locale.

This work is based off of Nexenta's contribution to Illumos.
The integration with xlocale is John Marino's work for Dragonfly.

The following commits will enable localedef tool, disable the colldef
tool, add generated colldef directory, and finally remove colldef from
base.

The only difference with Dragonfly are:
- a few fixes to build with clang
- And identification of the flavor as "BSD 1.0" instead of "Dragonfly 4.4"

Obtained from:	Dragonfly
2015-08-07 23:41:26 +00:00
theraven
b4f9c4540d __xlocale_C_ctype should not be const. It contains a reference count that is modified by newlocale / duplocale / freelocale.
MFC after:	1 week
2015-04-24 10:21:20 +00:00
theraven
8d131b2bfd Small changes to locale-related man pages.
Fix a missing .h and change the recommended include for the POSIX2008 functions from xlocale.h to locale.h.  Including xlocale.h is for legacy / Darwin compatibility so should not be encouraged.
2015-04-24 10:17:55 +00:00
tijl
b0813ee288 Remove the const qualifier from iconv(3) to comply with POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html

Adjust all code that calls iconv.

PR:		199099
Exp-run by:	antoine
MFC after:	2 weeks
2015-04-15 09:09:20 +00:00