25 Commits

Author SHA1 Message Date
bapt
b43fe740b5 lower again the warnings and remove the pragmas unsupported by gcc 4.2.1 2015-11-08 22:23:21 +00:00
bapt
6dbf9b92d9 Eliminate some gcc pragmas 2015-11-08 21:22:24 +00:00
bapt
88fd312cc5 Fix build of localedef(1) on arm where wchar_t is an unsigned int 2015-11-07 22:57:00 +00:00
bapt
b55edc44dd Rewrite the histoty part
Fix information about "Dragonfly-style" format which on freebsd is named
BSD-style

Noted by:	bdrewery
2015-11-07 21:07:40 +00:00
bapt
59e611f28e Improve localedef(1) manpage
Obtained from:	DragonflyBSD
2015-11-07 20:36:54 +00:00
bapt
cadf70cee5 Bump warning level 2015-11-07 20:31:23 +00:00
bapt
4e4bd7df36 Use const where needed instead of using pragmas to work around the warnings 2015-11-07 20:29:23 +00:00
bapt
cf3dd7e45d Make bsd declaration static 2015-11-07 20:27:31 +00:00
bapt
3e1b1042f9 Fix an off by one due to bad conversion from avl(3) to tree(3)
Readd calloc as it was not the issue just the messenger

Submitted by:	dim
Found by:	Address Sanitizer
2015-11-07 19:54:40 +00:00
bapt
397e26d135 Run memset only after having checked the return of malloc
Submitted by:	pluknet
2015-11-07 16:45:51 +00:00
bapt
130223d5a3 Workaround an issue on i386 to unbreak the build until the real issue is tracked
down
2015-11-07 16:22:29 +00:00
bapt
228f1040d8 Fix build on arm64 2015-11-07 15:03:45 +00:00
bapt
6b910e5e24 Add missing header 2015-11-07 12:11:17 +00:00
bapt
bd9546d6f6 Fix typo 2015-11-07 11:08:19 +00:00
bapt
8abd49c1b5 libc: Fix (and improve) nl_langinfo (CODESET)
The output of "locale charmap" is identical to the result of
nl_langinfo (CODESET) for any given locale.  The logic for returning the
codeset was very simplistic.  It just returned portion of the locale name
after the period (e.g. en_FR.ISO8859-1 returned "ISO8859-1").

When softlinks were added to locales, this broke.  e.g.:
   en_US returned ""
   en_FR.UTF8 returned "UTF8"
   en_FR.UTF-8 returned "UTF-8"
   zh_Hant_HK.Big5HKSCS returned "Big5HKSCS"
   zh_Hant_TW.Big5 returned "Big5"
   es_ES@euro returned ""

In order to fix this properly, the named locale cannot be used to
determine the encoding.  This information was almost available in the
rune data.  Unfortunately, all the single byte encodings were listed
as "NONE" encoding.

So I adjusted localedef tool to provide more information about the
encoding.  For example, instead of "NONE", the LC_CTYPE used by
fr_FR.ISO8859-15 is now encoded as "NONE:ISO8859-15".  The locale
handlers now check if the first four characters of the encoding is
"NONE" and if so, treats it as a single-byte encoding.

The nl_langinfo handling of CODESET was adjusting accordingly.  Now the
following is returned:
   en_US returns "ISO8859-1"
   fr_FR.UTF8 returns "UTF-8"
   fr_FR.UTF-8 returns "UTF-8"
   zh_Hant_HK.Big5HKSCS returns "Big5"
   zh_Hant_TW.Big5 returns "Big5"
   es_ES@euro returns "ISO8859-15"

as before, "C" and "POSIX" locales return "US-ASCII".  This is a big
improvement.  The result of nl_langinfo can never be a zero-length
string and it will always exclusively one of the values of the
character maps of /usr/src/tools/tools/locale/etc/final-maps.

Submitted by:	marino
Obtained from:	DragonflyBSD
2015-11-01 12:00:55 +00:00
bapt
1309f24ada Actually only T_ISDIGIT should be flagged as _E4 2015-10-19 14:48:31 +00:00
bapt
d3438cfcad With regard to ctype, digits (e.g. 0 to 9) and xdigits (the 0 to 9 portion
of hexidecimal numbers) are all considered "numbers".  (Note that while
all digits are numbers, not all numbers are digits).

Enhance localedef to automatically set the "number" characteristic when
it encounters a digit or xdigit definition. This fixes malfunctionning
isalnum(3)

Obtained from:	DragonflyBSD
2015-10-19 14:30:28 +00:00
bapt
caab9e01a4 eliminate need for "print" definition
By having space automatically classified as "print" type, we can
eliminate the print section from ctype src files completely (they
are just "graph" plus "<space>".

Obtained from:	Dragonfly
2015-10-13 20:45:29 +00:00
bapt
979669d923 Commit log from Dragonfly:
FreeBSD extended ctypes to include numbers (e.g. isnumber()) but never
actually implemented it.  The isnumber() function was equivalent to the
isdigit() function in every case.

Now that DragonFly's ctype source files have number definitions, the
number ctype can finally be implemented.  It's given a new flag _CTYPE_N.
The isalnum() and iswalnum() functions have been changed to use this
flag rather than the _CTYPE_D digit flag.

While isalnum(), isnumber(), and their wide equivalents now return
different values in locale cases, the ishexnumber() and iswhexnumber()
functions are unchanged.  They are still aliases for isxdigit() and
iswxdigit().

Also change ctype.h for isdigit and isxdigit to use sbistype like the
other functions.

Obtained from:	dragonfly
2015-10-13 20:43:49 +00:00
bapt
2a77c3b71d Merge from HEAD 2015-08-25 20:14:50 +00:00
bapt
290b5e138a Pet mandoc -Tlint 2015-08-09 13:20:53 +00:00
bapt
5f9f8ced91 Convert localedef(1) from avl to RB trees 2015-08-08 22:57:17 +00:00
bapt
64b5feb0c7 Prefer static generation of functions 2015-08-08 22:01:54 +00:00
bapt
2a251e9d73 Convert ctype generation to Red Black tree 2015-08-08 21:53:02 +00:00
bapt
6209f2034d Add localedef(1), a locale definition generator tool
The localedef tool can read entire (and unmodified) CLDR posix definition
files, and generate all 6 LC categories: LC_COLLATE, LC_CTYPE, LC_TIME,
LC_NUMERIC, LC_MONETARY and LC_MESSAGES.

This tool has a long history with Solaris.  The Nexenta developers
modified it to read CLDR files and created the much richer collation
formats.  The libc collation functions have to be modified to read the
new format (called "BSD-1.0") and to handle the new data structures.

The result will be that locale-sensitive tools and functions will now
properly sort multibyte and unicode strings.

Obtained from:	Dragonfly
2015-08-07 23:53:31 +00:00