Commit Graph

71 Commits

Author SHA1 Message Date
Yuri Pankov
9eb7d595e1 Reset persistent mbstates when rune locale encoding changes.
This was shown to be a problem by side effect of now-enabled test case,
which was going through C, en_US.UTF-8, ja_JP.SJIS, and ja_JP.eucJP,
and failing eventually as data in mbrtowc's mbstate, that was
perfectly correct for en_US.UTF-8 was treated as incorrect for
ja_JP.SJIS, failing the entire test case.

This makes the persistent mbstates to be per ctype-component,
and not per-locale so we could easily reset the mbstates when
only LC_CTYPE is changed.

Reviewed by:	bapt, pfg
Approved by:	kib (mentor, implicit)
Differential Revision:	https://reviews.freebsd.org/D17796
2018-11-09 03:32:53 +00:00
Pedro F. Giffuni
8a16b7a18f General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
Bryan Drewery
dc8507e1f7 __setrunelocale: Fix asprintf(3) failure not returning an error.
Also fix the style of the asprintf(3) call in __collate_load_tables_l().
Both of these lines were modified away from snprintf(3) during the
import from DragonFly/Illumos.

Reviewed by:	jilles (briefly over shoulder)
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-09-29 16:30:50 +00:00
Pedro F. Giffuni
be53a489c6 libc: minor indent(1) cleanups.
Illumos and Schillix is adopting some of the locale code and our style(9)
sometimes matches the Solaris cstyle, so the changes are also useful as a
way to reduce diffs.

No functional change.

Discussed with: Joerg Schilling
MFC after:	1 week
2017-08-26 16:11:21 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Baptiste Daroussin
473aa0b7ee locales: Enforce US-ASCII encoding (limited to 7-bit)
The US-ASCII format was getting treated identically to POSIX.  It is
supposed to throw an ILSEQ errno if a value of 0x80 or greater is
encountered, so let's bring back the "ASCII" handling.

While here, change nl_codeset to return US-ASCII only when the encoding
really is "US-ASCII".  Before "C" and "POSIX" encoding returned this
string, so now they return "POSIX".

Discussed with:	ache
Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-09 22:06:22 +00:00
Baptiste Daroussin
d79cdd21de libc: Fix (and improve) nl_langinfo (CODESET)
The output of "locale charmap" is identical to the result of
nl_langinfo (CODESET) for any given locale.  The logic for returning the
codeset was very simplistic.  It just returned portion of the locale name
after the period (e.g. en_FR.ISO8859-1 returned "ISO8859-1").

When softlinks were added to locales, this broke.  e.g.:
   en_US returned ""
   en_FR.UTF8 returned "UTF8"
   en_FR.UTF-8 returned "UTF-8"
   zh_Hant_HK.Big5HKSCS returned "Big5HKSCS"
   zh_Hant_TW.Big5 returned "Big5"
   es_ES@euro returned ""

In order to fix this properly, the named locale cannot be used to
determine the encoding.  This information was almost available in the
rune data.  Unfortunately, all the single byte encodings were listed
as "NONE" encoding.

So I adjusted localedef tool to provide more information about the
encoding.  For example, instead of "NONE", the LC_CTYPE used by
fr_FR.ISO8859-15 is now encoded as "NONE:ISO8859-15".  The locale
handlers now check if the first four characters of the encoding is
"NONE" and if so, treats it as a single-byte encoding.

The nl_langinfo handling of CODESET was adjusting accordingly.  Now the
following is returned:
   en_US returns "ISO8859-1"
   fr_FR.UTF8 returns "UTF-8"
   fr_FR.UTF-8 returns "UTF-8"
   zh_Hant_HK.Big5HKSCS returns "Big5"
   zh_Hant_TW.Big5 returns "Big5"
   es_ES@euro returns "ISO8859-15"

as before, "C" and "POSIX" locales return "US-ASCII".  This is a big
improvement.  The result of nl_langinfo can never be a zero-length
string and it will always exclusively one of the values of the
character maps of /usr/src/tools/tools/locale/etc/final-maps.

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-01 12:00:55 +00:00
Baptiste Daroussin
28a20bb3f5 Use more asprintf
Plug memory leak introduced in previous asprintf addition
2015-08-09 12:13:30 +00:00
Baptiste Daroussin
7b2473410f Revamp CTYPE support (from Illumos & Dragonfly)
Obtained from:	Dragonfly
2015-08-08 18:22:14 +00:00
Baptiste Daroussin
2a6abeebef The collate functions within libc have been using version 1 and 1.2 of the
packed LC_COLLATE binary formats. These were generated with the colldef
tool, but the new LC_COLLATE files are going to be generated by the new
localedef tool using CLDR POSIX files as input.  The BSD-flavored
version of localedef identifies the format as "BSD 1.0".  Any
LC_COLLATE file with a different version will simply not be loaded, and
all LC* categories will get set to "C" (aka "POSIX") locale.

This work is based off of Nexenta's contribution to Illumos.
The integration with xlocale is John Marino's work for Dragonfly.

The following commits will enable localedef tool, disable the colldef
tool, add generated colldef directory, and finally remove colldef from
base.

The only difference with Dragonfly are:
- a few fixes to build with clang
- And identification of the flavor as "BSD 1.0" instead of "Dragonfly 4.4"

Obtained from:	Dragonfly
2015-08-07 23:41:26 +00:00
David Chisnall
8d07b7deff Fix an issue where the locale and rune locale could become out of sync,
causing mb* functions (and similar) to be called with the wrong data
(possibly a null pointer, causing a crash).

PR:		standards/188036
MFC after:	1 week
2014-04-02 11:10:46 +00:00
Jilles Tjoelker
07588bf421 libc: Make various internal file descriptors close-on-exec.
These are obtained via fopen().
2012-12-11 22:52:56 +00:00
Brooks Davis
1652751915 Improve style(9) compliance of function declarations. 2012-12-10 17:34:33 +00:00
David Chisnall
bb4317bf3c Restore the __collate_load_error global that was accidentally removed in the
xlocale refactoring.

MFC after:	1 week
2012-07-06 20:16:22 +00:00
David Chisnall
1967332863 Fix a leak when setting the global character locale to "C" from something else.
Reported by:	mm
2012-06-11 14:02:02 +00:00
David Chisnall
a8ed63bb3d Reapply 227753 (xlocale cleanup), plus some fixes so that it passes build
universe with gcc.

Approved by:	dim (mentor)
2012-03-04 15:31:13 +00:00
Dimitry Andric
b74cf6dcf1 Revert r231673 and r231682 for now, until we can run a full make
universe with them.  Sorry for the breakage.

Pointy hat to:	     me and brooks
2012-02-14 21:48:46 +00:00
David Chisnall
7780c181c5 Fix a misplaced __NO_TLS locations, and change a GNUism to a C11ism for
consistency.

Approved by:	brooks (mentor)
2012-02-14 14:24:37 +00:00
David Chisnall
82dd5016bd Cleanup of xlocale:
- Address performance regressions encountered by das@ by caching per-thread
  data in TLS where available.
- Add a __NO_TLS flag to cdefs.h to indicate where not available.
- Reorganise the xlocale.h definitions into xlocale/*.h so that they can be
  included from multiple places.
- Export the POSIX2008 subset of xlocale when POSIX2008 says it should be
  exported, independently of whether xlocale.h is included.
- Fix the bug where programs using ctype functions always assumed ASCII unless
  recompiled.
- Fix some style(9) violations.

Reviewed by:	brooks (mentor)
Approved by:	dim (mentor)
2012-02-14 12:03:23 +00:00
David Chisnall
3c87aa1d3d Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a
load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter.  Also
adds support for per-thread locales.  This work was funded by the FreeBSD
Foundation.

Please test any code you have that uses the C standard locale functions!

Reviewed by:    das (gdtoa changes)
Approved by:    dim (mentor)
2011-11-20 14:45:42 +00:00
Andrey A. Chernov
67e7bdee55 Fix longstanding mb/wc functions segfault if error occurse
inside _<encoding>_init().
Currently _EUC_init() only was affected.
2008-01-23 03:05:35 +00:00
Andrey A. Chernov
5ebf111155 Better fix for longstanding segfault. Don't touch current locale at all
on unknown encoding. Previous fix resets it to POSIX.
2008-01-23 02:17:27 +00:00
Andrey A. Chernov
5776848851 1) Add (void) cast to _none_init() (while I am here)
2) Fix longstanding segfault in mb/wc code when unknown encoding is specified
in the locale file (mb/wc functions becomes NULL in that case).
2008-01-23 01:57:26 +00:00
Andrey A. Chernov
91e0bf6a77 Introduce new encoding: "ASCII"
It differs from default C/POSIX "NONE" mainly by stricter 8bit check
for mb*towc*/wc*tomb* family, returning EILSEQ
2008-01-21 23:48:12 +00:00
Andrey A. Chernov
367ed4e13d The problem is: currently our single byte ctype(3) functions are broken
for wide characters locales in the argument range >= 0x80 - they may
return false positives.

Example 1: for UTF-8 locale we currently have:
iswspace(0xA0)==1 and isspace(0xA0)==1
(because iswspace() and isspace() are the same code)
but must have
iswspace(0xA0)==1 and isspace(0xA0)==0
(because there is no such character and all others in the range
0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte
range because our internal wchar_t representation for UTF-8 is UCS-4).

Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may
return false positives (must be 0).
(because iswalpha() and isalpha() are the same code)

This change address this issue separating single byte and wide ctype
and also fix iswascii() (currently iswascii() is broken for
arguments > 0xFF).
This change is 100% binary compatible with old binaries.

Reviewied by: i18n@
2007-10-13 16:28:22 +00:00
Warner Losh
c879ae3536 Per Regents of the University of Calfornia letter, remove advertising
clause.

# If I've done so improperly on a file, please let me know.
2007-01-09 00:28:16 +00:00
Alexey Zelkin
e94c6cb4a2 . Static'ize functions exported via function reference variables only.
. Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where
  appropriate.
. move _*_init() prototypes to mblocal.h, and remove these prototypes
  from .c files
. use _none_init() in __setrunelocale() instead of duplicating code
. move __mb* variables from table.c to none.c allowing us to not to
  export _none_*() externs, and appropriately remove them from mblocal.h

Ok'ed by:	tjr
2005-02-27 15:11:09 +00:00
Andrey A. Chernov
27ecbe8a77 Remove setrunelocale() 2004-10-18 02:06:18 +00:00
Tim J. Robbins
31d330fb2a Remove the obsolete <rune.h> interface. 2004-10-17 06:51:50 +00:00
Tim J. Robbins
79a3948997 Remove support for the obsolete UTF2 encoding. 2004-10-17 02:29:15 +00:00
Tim J. Robbins
5a52f3c22c Change "deprecated" in link-time warnings about various rune functions
to "obsolete".
2004-08-21 07:48:06 +00:00
Tim J. Robbins
1949a3470f Implement the GNU extensions of mbsnrtowcs() and wcsnrtombs(). These are
convenient when the source string isn't null-terminated.

Implement the other conversion functions (mbstowcs(), mbsrtowcs(), wcstombs(),
wcsrtombs()) in terms of these new functions.
2004-07-21 10:54:57 +00:00
Tim J. Robbins
ddc1eded85 Prefix the names of members of _RuneLocale and its sub-structures
with ``__'' to avoid polluting the namespace. This doesn't change the
documented rune interface at all, but breaks applications that accessed
_RuneLocale directly.
2004-06-23 07:01:44 +00:00
Tim J. Robbins
ea4ac135ff Allow encoding modules to override the default implementations of
mbsrtowcs() and wcsrtombs(). Provide a fast implementation for the
trivial "NONE" encoding.
2004-05-13 11:20:27 +00:00
Tim J. Robbins
2051a8f2d5 Move prototypes of various encoding-related functions into a new header
file to avoid extern'ing them all over the place.
2004-05-12 14:09:04 +00:00
Tim J. Robbins
ca2dae426e Allow partial multibyte characters to accumulate in conversion state
objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required
by C99.
2004-04-07 10:48:19 +00:00
Tim J. Robbins
4fb9e805dc Remove support for emulating mbrtowc() and wcrtomb() in terms of the
old rune interface now that it is no longer needed.
2004-04-04 11:31:29 +00:00
Tim J. Robbins
40c5c1f8a1 Set __mbrtowc and __wcrtomb correctly when changing to the C/POSIX locale.
Save __mbrtowc and __wcrtomb and restore them when changing back to
the cached locale.

Reported by:	perky
2003-12-08 23:52:22 +00:00
David Xu
6d7a04b013 Add gb2312 encoding. 2003-11-05 22:52:51 +00:00
Tim J. Robbins
d4f6cd06dd Allow mbrtowc() and wcrtomb() to be implemented directly, instead of
as wrappers around the deprecated 4.4BSD rune functions. This paves the
way for state-dependent encodings, which the rune API does not support.
- Add __emulated_sgetrune() and __emulated_sputrune(), which are
  implementations of sgetrune() and sputrune() in terms of
  mbrtowc() and wcrtomb().
- Rename the old rune-wrapper mbrtowc() and wcrtomb() functions to
  __emulated_mbrtowc() and __emulated_wcrtomb().
- Add __mbrtowc and __wcrtomb function pointers, which point to the
  current locale's conversion functions, or the __emulated versions.
- Implement mbrtowc() and wcrtomb() as calls to these function pointers.
- Make the "NONE" encoding implement mbrtowc() and wcrtomb() directly.

All of this emulation mess will be removed, together with rune support,
in FreeBSD 6.
2003-11-01 05:13:13 +00:00
Andrey A. Chernov
a03081087c Add support for gb18030 encoding
PR:             51729
Submitted by:   Kang Liu <liukang@bjpu.edu.cn>
2003-07-29 07:52:44 +00:00
Andrey A. Chernov
8b2749e901 Add const to __setrunelocale prototype 2003-07-06 04:01:09 +00:00
Andrey A. Chernov
68d429c3fc Reorganize wrapper around setrunelocale() to mark it as deprecated
in FreeBSD 6
2003-07-06 02:03:37 +00:00
Alexey Zelkin
fca2738d67 Reduce code duplication by separating _PathLocle detection code into
internal helper function.
2003-06-25 22:42:33 +00:00
Andrey A. Chernov
9d793e98ec Add GBK encoding
PR:             51504
Submitted by:   Statue <statue@freebsd.sinica.edu.tw>
2003-06-01 15:30:56 +00:00
Tim J. Robbins
972baa3747 Add a UTF-8 encoding method, which will eventually replace the antique
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible
for 16-bit characters, the new UTF-8 implementation is much more strict
about rejecting malformed input and also handles the full 31 bit range
of characters.
2002-10-10 22:56:18 +00:00
Andrey A. Chernov
ec5ca2eba7 Add safeguards to never use errno == 0 as setrunelocale() error return code 2002-08-09 08:22:29 +00:00
Andrey A. Chernov
76692b8025 Rewrite locale loading procedures, so any load failure will not affect
currently cached data.  It allows a number of nice things, like: removing
fallback code from single locale loading, remove memory leak when LC_CTYPE
data loaded again and again, efficient cache use, not only for
setlocale(locale1); setlocale(locale1), but for setlocale(locale1);
setlocale("C"); setlocale(locale1) too (i.e.  data file loaded only once).
2002-08-08 05:51:54 +00:00
Andrey A. Chernov
57473ad215 Reset __mb_cur_max to 1 when "C" or "POSIX" locales loaded after multibyte one 2002-08-07 20:49:25 +00:00
Andrey A. Chernov
2f6754febb Catch empty encoding name too 2002-08-03 17:09:21 +00:00