Commit Graph

635 Commits

Author SHA1 Message Date
Thomas Munro
cc7edd258c Add collation version support to querylocale(3).
Provide a way to ask for an opaque version string for a locale_t, so
that potential changes in sort order can be detected.  Similar to
ICU's ucol_getVersion() and Windows' GetNLSVersionEx(), this API is
intended to allow databases to detect when text order-based indexes
might need to be rebuilt.

The CLDR version is extracted from CLDR source data by the Makefile
under tools/tools/locale, written into the machine-generated Makefile
under shared/colldef, passed to localedef -V, and then written into
LC_COLLATE file headers.  The initial version is 34.0.
tools/tools/locale was recently updated to pull down 35.0, but the
output hasn't been committed under share/colldef yet, so that will
provide the first observable change when it happens.  Other versioning
schemes are possible in future, because the format is unspecified.

Reviewed by:	bapt, 0mp, kib, yuripv (albeit a long time ago)
Differential Revision:	https://reviews.freebsd.org/D17166
2020-11-08 02:50:34 +00:00
Mark Johnston
afd95785c0 newlocale(3): Fix a memory leak.
newlocale() optionally takes a "base" locale, from which components not
specified in the mask are inherited.  POSIX says that newlocale() may
modify "base" and return it, or free "base" and return a newly allocated
locale.  We were not doing either, so applications which use newlocale()
to modify an existing base locale end up leaking memory on FreeBSD.

This diff fixes the leak by releasing a reference to the base locale
before returning.  This is less efficient than modifying "base"
directly, but is simpler for an initial bug fix.  Also, update the man
page to clarify behaviour with respect to "base".

PR:		249416
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26522
2020-10-02 18:35:55 +00:00
Gordon Bergling
eef7327a68 setlocale(3): Add an EXAMPLES section and add LANG category
PR:		41824
Submitted by:	Slaven Rezic <eserte atvran dot herceg dot de>
Obtained from:	NetBSD
MFC after:	1 week
2020-08-07 17:25:56 +00:00
Gordon Bergling
90fb6afc55 mbsrtowcs(3): Clarify the RETURN VALUES section
PR:		215848
Submitted by:	Andrew Stevenson <andrew at ugh dot net dot au>
MFC after:	1 week
2020-08-07 16:56:43 +00:00
Joerg Wunsch
72adb2c07f Explain how to learn about possible recognized locale names
Posix says that the interpretation of the locale string is
"implementation-defined", so we ought to document what is
actually recognized.

Also add a cross reference to locale(1).

PR:		247553
MFC after:	1 week
2020-06-27 20:55:47 +00:00
Mateusz Piotrowski
89064ec6ef Use proper mdoc(7) macros for literal text and do not use Tn
Tn is deprecated and upsets linters.

MFC after:	3 days
2020-04-01 09:01:35 +00:00
Mark Johnston
733983be9b Fix uselocale(3) to not leak a reference to the old locale.
In a single-threaded program pthread_getspecific() always returns NULL,
so the old locale would not end up being freed.

PR:		239520
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-03-20 20:02:53 +00:00
Mark Johnston
57e642365b libc: Fix a few bugs in the xlocale collation code.
- Fix checks for mmap() failures. [1]
- Set the "map" and "maplen" fields of struct xlocale_collate so that
  the table destructor actually does something.
- Free an already-mapped collation file before loading a new one into
  the global table.
- Harmonize the prototype and definition of __collate_load_tables_l() by
  adding the "static" qualifier to the latter.

PR:		243195
Reported by:	cem [1]
Reviewed by:	cem, yuripv
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23109
2020-01-09 20:49:26 +00:00
Ed Maste
3bac34907e localeconv: correct grouping and mon_grouping per C/POSIX
grouping and mon_grouping should be "" in the C locale.

PR:		172215
MFC after:	6 weeks
Sponsored by:	The FreeBSD Foundation
2019-12-19 17:01:25 +00:00
Li-Wen Hsu
5d1f74b63d Improve the description of big5(5)
- Fix the statement that big5 is a de facto standard of Traditional Chinese
  text [1]
- Add a BUGS section describes the problem of big5 and suggests use utf8

PR:		189095
Submitted by:	Brennan Vincent <brennan@umanwizard.com> [1]
Reviewed by:	Ting-Wei Lan <lantw44@gmail.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21622
2019-09-14 08:15:16 +00:00
Baptiste Daroussin
b3f9b73820 In FreeBSD 11 localedef(1) has replaced the mklocale(1) and colldef(1)
tools to generate the locales data. state it in the libc manpages.

MFC after:	3 days
2019-09-10 07:47:52 +00:00
Yuri Pankov
98f3acfaaf Fix WITHOUT_ICONV build after r340276.
Reported by:	olivier
Approved by:	kib (mentor, implicit)
2018-11-14 09:06:15 +00:00
Yuri Pankov
9eb7d595e1 Reset persistent mbstates when rune locale encoding changes.
This was shown to be a problem by side effect of now-enabled test case,
which was going through C, en_US.UTF-8, ja_JP.SJIS, and ja_JP.eucJP,
and failing eventually as data in mbrtowc's mbstate, that was
perfectly correct for en_US.UTF-8 was treated as incorrect for
ja_JP.SJIS, failing the entire test case.

This makes the persistent mbstates to be per ctype-component,
and not per-locale so we could easily reset the mbstates when
only LC_CTYPE is changed.

Reviewed by:	bapt, pfg
Approved by:	kib (mentor, implicit)
Differential Revision:	https://reviews.freebsd.org/D17796
2018-11-09 03:32:53 +00:00
Yuri Pankov
dd7c41a378 Add hybrid C.UTF-8 locale being identical to default C locale except
that it uses the same ctype maps and functions as other UTF-8 locales.

Reviewed by:	bapt, cem, eadler
Approved by:	kib (mentor, implicit)
Differential Revision:	https://reviews.freebsd.org/D17833
2018-11-04 22:13:22 +00:00
Yuri Pankov
4644f9bef6 Add -b/-l options to localedef(1) to specify output endianness and use
it appropriately when building share/ctypedef and share/colldef.

This makes the resulting locale data in EL->EB (amd64->powerpc64) cross
build and in the native EB build match.  Revert the changes done to libc
in r308170 as they are no longer needed.

PR:		231965
Reviewed by:	bapt, emaste, sbruno, 0mp
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D17603
2018-10-20 20:51:05 +00:00
Edward Tomasz Napierala
604f1c416c Don't put multiple names on a single .Nm line. This fixes apropos(1)
output, from this:

strnlen, strlen, strlen,(3) - find length of string                                                                                                                                                     │·······

... to this:

strlen, strnlen(3) - find length of string

PR:		223525
MFC after:	2 weeks
2018-04-17 09:05:46 +00:00
Eitan Adler
5a51239a71 libc/locale: fix an off-by-one in newlocale
Reported by:	zrj@DragonFlyBSD.org
2017-12-29 14:56:46 +00:00
Pedro F. Giffuni
91fb056ed6 SPDX: Fix some License ID tags for libc. 2017-12-27 21:21:03 +00:00
Pedro F. Giffuni
d915a14ef0 libc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-25 17:12:48 +00:00
Pedro F. Giffuni
8a16b7a18f General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Bryan Drewery
dc8507e1f7 __setrunelocale: Fix asprintf(3) failure not returning an error.
Also fix the style of the asprintf(3) call in __collate_load_tables_l().
Both of these lines were modified away from snprintf(3) during the
import from DragonFly/Illumos.

Reviewed by:	jilles (briefly over shoulder)
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-09-29 16:30:50 +00:00
David Chisnall
c0cd38223c Document some invariants for the XLC_ enum.
These can't be reordered without breaking other code.  Document that and add
some static asserts to ensure that anyone who tries gets build failures.
2017-09-07 17:51:35 +00:00
Pedro F. Giffuni
be53a489c6 libc: minor indent(1) cleanups.
Illumos and Schillix is adopting some of the locale code and our style(9)
sometimes matches the Solaris cstyle, so the changes are also useful as a
way to reduce diffs.

No functional change.

Discussed with: Joerg Schilling
MFC after:	1 week
2017-08-26 16:11:21 +00:00
Enji Cooper
5e84ba7a43 localeconv(3): start sentences on new lines
Reported by:	make manlint
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-23 07:09:26 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Pedro F. Giffuni
ba7161ed72 Move __hidden attribute towards the end of the declaration.
Apple had them at the start but moving them to the end is better for
faster reading and fits better what is done in other FreeBSD headers.

MFC after:	5 days
2016-12-31 15:30:00 +00:00
Eric van Gyzen
81f91de9f7 Fix error reporting from wcstof()
When wcstof() skipped initial space and then parsing failed, it set
endptr to the first non-space character.  Fix it to correctly report
failure by setting endptr to the beginning of the input string.
The fix is from theraven@, who fixed this bug in wcstod() and
wcstold() in r227753.

While I'm here:

Move assignments out of declarations in wcstod() and wcstold().
This is against my personal preference, but it is our agreed style(9).

Set endptr correctly on malloc() failure in all three functions.

Remove an incorrect comment:  This is pointer arithmetic,
so the code was not actually making that assumption.

wcstold() advanced the wcp pointer beyond leading whitespace
and then reset it back to the beginning of the string.
Do not reset it.  This seems to have no functional effect,
since strtold_l() also skips leading whitespace.  I'm making
the change to keep this function consistent with wcstof() and
wcstod(), and because the C11 spec prescribes the use of iswspace()
to skip leading space.

Reported by:	libc++ unit test for std::stof(std::wstring)
MFC after:	8 days
Sponsored by:	Dell EMC
2016-11-20 20:13:22 +00:00
Ruslan Bukin
77bc2a1cd6 Locale fix for endian big (EB) machines.
We have locale files generated on EL machines (e.g. during cross-build
on amd64 host), but then we are using them on EB machines (e.g. MIPS64EB),
so proceed byte-swap if necessary.

All the libc tests passed successfully, including Russian collation.

Tested by:	br@, Hongyan Xia <hx242@cam.ac.uk>
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D8281
2016-11-01 13:54:44 +00:00
Ed Schouten
718fe473dd Change the return type of freelocale(3) to void.
Our version of this function currently returns an integer indicating
failure or success, whereas POSIX specifies that this function has no
return value. It returns void. Patch up the header, sources and man page
to use the right type. While there, use the opportunity to simplify the
body of this function.

Theoretically speaking, this change breaks the ABI of this function.
That said, I have yet to find any code that makes use of freelocale()'s
return value. I couldn't find any of it in the base system, nor did an
exp-run reveal any breakage caused by this change.

PR:		211394 (exp-run)
2016-07-29 17:18:47 +00:00
Pedro F. Giffuni
5c4d28f5dc libc: tag the Rune initialization function prototypes visibility as hidden.
It is good practice to export as few symbols as possible from your shared
libraries, so use the GCC visibility attribute in this case, matching what
Apple's libc does.

Reference:
https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/CppRuntimeEnv/Articles/SymbolVisibility.html

Hinted by:	Apple's libc 1082.20.4
MFC after:	1 week
2016-07-19 20:22:13 +00:00
Baptiste Daroussin
dee0bbbdca Revert 302324 and properly fix the crash with ISO-8859-5 locales
PR:		211135
Reported by:	jkim
Tested by:	jkim
MFC after:	2 days
2016-07-15 23:03:20 +00:00
Andrey A. Chernov
12eae8c8f3 1) Eliminate possibility to call __*collate_range_cmp() with inclomplete
locale (which cause core dump) by removing whole 'table' argument
by which it passed.

2) Restore __collate_range_cmp() in __sccl().

3) Collating [a-z] range in regcomp() only for single bytes locales
(we can't do it now for other ones). In previous state only first 256
wchars are considered and all others are just silently dropped from the
range.
2016-07-14 09:07:25 +00:00
Andrey A. Chernov
1daad8f5ad Back out non-collating [a-z] ranges.
Instead of changing whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and concentrace on fixing bugs in what we have now,
too many small obstacles instead, counting ports.
2016-07-14 08:18:12 +00:00
Andrey A. Chernov
5a5807dd4c Remove broken support for collation in [a-z] type ranges.
Only first 256 wide chars are considered currently, all other are just
dropped from the range. Proper implementation require reverse tables
database lookup, since objects are really big as max UTF-8 (1114112
code points), so just the same scanning as it was for 256 chars will
slow things down.

POSIX does not require collation for [a-z] type ranges and does not
prohibit it for non-POSIX locales. POSIX require collation for ranges
only for POSIX (or C) locale which is equal to ASCII and binary for
other chars, so we already have it.

No other *BSD implements collation for [a-z] type ranges.

Restore ABI compatibility with unused now __collate_range_cmp() which
is visible from outside (will be removed later).
2016-07-10 03:49:38 +00:00
Baptiste Daroussin
a8bacd6f93 Fix a bad test resulting in a segfault with ISO-8859-5 locales
Reported by:	Lauri Tirkkonen from Illumos
Approved by:	re@ (gjb)
2016-07-03 15:00:12 +00:00
Pedro F. Giffuni
3c2c0c0443 libc/locale: Fix type breakage in __collate_range_cmp().
When collation support was brought in, the second and third
arguments in __collate_range_cmp() were changed from int to
wchar_t, breaking the ABI. Change them to a "char" type which
makes more sense and keeps the ABI compatible.

Also introduce __wcollate_range_cmp() which does work with wide
characters. This function is used only internally in libc so
we don't export it. Use the new function in glob(3), fnmatch(3),
and regexec(3).

PR:		179721
Suggested by:	ache. jilles
MFC after:	3 weeks (perhaps partial only)
2016-06-05 19:12:52 +00:00
Andrey A. Chernov
2f423a266a For EILSEQ case in mbsnrtowcs() and wcsnrtombs() update src to point to
the character after the one this conversion stopped at.

PR:     209907
Submitted by: Roel Standaert <roel@abittechnical.com> (partially)
MFC after:      3 days
2016-05-31 18:44:33 +00:00
Pedro F. Giffuni
32223c1b7d libc: spelling fixes.
Mostly on comments.
2016-04-30 01:24:24 +00:00
Baptiste Daroussin
49c4407313 Restore the original ascii.c from prior to r290494
It was doing the right thing, there was no need to "fail" to reinvent it from
none.c

Pointy hat:	bapt
Submitted by:	ache
2016-04-21 07:36:11 +00:00
Baptiste Daroussin
d3591d68a9 Check the returned value of memchr(3) before using it
Reported by:	Coverity
CID:		1338530
2016-04-20 20:44:30 +00:00
Pedro F. Giffuni
513004a23d libc: replace 0 with NULL for pointers.
While here also cleanup some surrounding code; particularly
drop some malloc() casts.

Found with devel/coccinelle.

Reviewed by:	bde (previous version - all new bugs are mine)
2016-04-10 19:33:58 +00:00
Andrey A. Chernov
ae7abb26b1 SJIS encoding don't have single byte characters >= 224
MFC after:      1 week
2016-04-04 15:56:14 +00:00
Andrey A. Chernov
e08c3b7c11 EUC-type encodings don't have single byte characters >= 128
This change should not be MFCed until new collate will be
MFCed first, because our old EUC tables have some hacks for
missing codesets.
2016-04-04 02:43:35 +00:00
Pedro F. Giffuni
45256214eb mbtowc(3): set errno to EILSEQ if an incomplete character is passed.
According to POSIX, The mbtowc() function shall fail if:
[EILSEQ] An invalid character sequence is detected.

Reviewed by:		bapt
Differential Revision:	https://reviews.freebsd.org/D5496

Obtained from:	OpenBSD (Ingo Schwarze)
MFC after:	1 month
2016-03-01 19:15:34 +00:00
Enji Cooper
0b5cc81d3b Link localeconv(3) to localeconv_l(3)
MFC after: 3 days
2015-11-25 09:12:30 +00:00
Baptiste Daroussin
87101cb572 return "US-ASCII" instead of "POSIX" for "C" and "POSIX" locales
as it used to be in previous version of the locales. Returning
"POSIX" has too many fallouts.
2015-11-10 08:11:27 +00:00
Baptiste Daroussin
403105944d nl_langinfo: Simplify case ladder
The NONE:US-ASCII case isn't necessary.  The "NONE:" case will handle
US-ASCII, so let's remove the redundant handling.

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-09 22:29:47 +00:00
Baptiste Daroussin
22b87a3555 Readd ascii.c forgotten in r290618 2015-11-09 22:11:37 +00:00
Baptiste Daroussin
473aa0b7ee locales: Enforce US-ASCII encoding (limited to 7-bit)
The US-ASCII format was getting treated identically to POSIX.  It is
supposed to throw an ILSEQ errno if a value of 0x80 or greater is
encountered, so let's bring back the "ASCII" handling.

While here, change nl_codeset to return US-ASCII only when the encoding
really is "US-ASCII".  Before "C" and "POSIX" encoding returned this
string, so now they return "POSIX".

Discussed with:	ache
Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-09 22:06:22 +00:00