Restore some of the ctype definitions reported in the PR from pre-CLDR

data, namely 0xE000-0xF8FF private use area, and 0xFF00-0xFFF half- and
fullwidth punctuation.

While here, update tools/tools/locale/README based on my experience
rebuilding the locale data.

PR:		225692
Reviewed by:	bapt, cem (previous version)
Approved by:	re (gjb), kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D17471
This commit is contained in:
yuripv 2018-10-11 18:30:12 +00:00
parent f410335413
commit a7a80f58ca
4 changed files with 81 additions and 16 deletions

View File

@ -6240,6 +6240,12 @@ graph <MEETEI_MAYEK_LETTER_E>;...;<MEETEI_MAYEK_VIRAMA>
graph <MEETEI_MAYEK_LETTER_KOK>;...;<MEETEI_MAYEK_APUN_IYEK>
digit <MEETEI_MAYEK_DIGIT_ZERO>;...;<MEETEI_MAYEK_DIGIT_NINE>
**********************************************************************
* 0xE000 - 0xF8FF Private Use Area (from pre-CLDR data)
**********************************************************************
graph <PRIVATE_USE_AREA-E000>;...;<PRIVATE_USE_AREA-F8FF>
**********************************************************************
* 0xFB50 - 0xFDFF Arabic Presentation Forms (differential)
**********************************************************************
@ -6278,6 +6284,17 @@ punct <SMALL_COMMA>;...;<SMALL_COMMERCIAL_AT>
blank <ZERO_WIDTH_NO-BREAK_SPACE>
**********************************************************************
* 0xFF00 - 0xFFFF Half- and Fullwidth Punctuation (from pre-CLDR data)
**********************************************************************
punct <FULLWIDTH_EXCLAMATION_MARK>;...;<FULLWIDTH_SOLIDUS>;/
<FULLWIDTH_COLON>;...;<FULLWIDTH_COMMERCIAL_AT>;/
<FULLWIDTH_LEFT_SQUARE_BRACKET>;...;<FULLWIDTH_GRAVE_ACCENT>;/
<FULLWIDTH_LEFT_CURLY_BRACKET>;...;<HALFWIDTH_KATAKANA_MIDDLE_DOT>;/
<FULLWIDTH_CENT_SIGN>;...;<FULLWIDTH_WON_SIGN>;/
<HALFWIDTH_FORMS_LIGHT_VERTICAL>;...;<HALFWIDTH_WHITE_CIRCLE>
**********************************************************************
* 0x10300 - 0x1032F Old Italic
**********************************************************************

View File

@ -2,23 +2,37 @@
To generate the locales:
Tools needed: java, perl, devel/p5-Tie-IxHash, converters/p5-Text-Iconv and
textproc/p5-XML-Parser
Tools needed:
java (openjdk >= 8)
perl
converters/p5-Text-Iconv
devel/p5-Tie-IxHash
textproc/p5-XML-Parser
fetch cldr data from: http://cldr.unicode.org
extract in a directory ~/unicode/cldr/v30.0.3 for example
fetch unidata from http://www.unicode.org/Public/zipped/ (latest version)
extract in a directory ~/unicode/UNIDATA/9.0.0 for example
Fetch CLDR data from: http://unicode.org/Public/cldr/. You need all of the
core.zip, keyboards.zip, and tools.zip.
Note that the prebuilt cldr tools are not working on freebsd, it needs to
be rebuilt:
cd $CLDRDIR/tools/java
ant build
Extract:
mkdir -p ~/unicode/cldr/v33.0
cd ~/unicode/cldr/v33.0
unzip ~/core.zip ~/keyboards.zip ~/tools.zip
either modify tools/tools/locales/etc/unicode.conf or export variables:
CLDRDIR="~/unicode/cldr/v30.0.3"
UNIDATADIR="~/unicode/UNIDATA/9.0.0"
Fetch unidata (UCD.zip) from http://www.unicode.org/Public/zipped/latest.
run:
make POSIX
make install
Extract:
mkdir -p ~/unicode/UNIDATA/11.0.0
cd ~/unicode/UNIDATA/11.0.0
unzip ~/UCD.zip
Either modify tools/tools/locales/etc/unicode.conf or export variables:
CLDRDIR=~/unicode/cldr/v33.0; export CLDRDIR
UNIDATADIR=~/unicode/UNIDATA/9.0.0; export UNIDATADIR
Build the CLDR tools:
cd $CLDRDIR/tools/java
ant jar
Run:
make POSIX
make
make install

View File

@ -6240,6 +6240,12 @@ graph <MEETEI_MAYEK_LETTER_E>;...;<MEETEI_MAYEK_VIRAMA>
graph <MEETEI_MAYEK_LETTER_KOK>;...;<MEETEI_MAYEK_APUN_IYEK>
digit <MEETEI_MAYEK_DIGIT_ZERO>;...;<MEETEI_MAYEK_DIGIT_NINE>
**********************************************************************
* 0xE000 - 0xF8FF Private Use Area (from pre-CLDR data)
**********************************************************************
graph <PRIVATE_USE_AREA-E000>;...;<PRIVATE_USE_AREA-F8FF>
**********************************************************************
* 0xFB50 - 0xFDFF Arabic Presentation Forms (differential)
**********************************************************************
@ -6278,6 +6284,17 @@ punct <SMALL_COMMA>;...;<SMALL_COMMERCIAL_AT>
blank <ZERO_WIDTH_NO-BREAK_SPACE>
**********************************************************************
* 0xFF00 - 0xFFFF Half- and Fullwidth Punctuation (from pre-CLDR data)
**********************************************************************
punct <FULLWIDTH_EXCLAMATION_MARK>;...;<FULLWIDTH_SOLIDUS>;/
<FULLWIDTH_COLON>;...;<FULLWIDTH_COMMERCIAL_AT>;/
<FULLWIDTH_LEFT_SQUARE_BRACKET>;...;<FULLWIDTH_GRAVE_ACCENT>;/
<FULLWIDTH_LEFT_CURLY_BRACKET>;...;<HALFWIDTH_KATAKANA_MIDDLE_DOT>;/
<FULLWIDTH_CENT_SIGN>;...;<FULLWIDTH_WON_SIGN>;/
<HALFWIDTH_FORMS_LIGHT_VERTICAL>;...;<HALFWIDTH_WHITE_CIRCLE>
**********************************************************************
* 0x10300 - 0x1032F Old Italic
**********************************************************************

View File

@ -876,6 +876,12 @@ graph <MEETEI_MAYEK_LETTER_E>;...;<MEETEI_MAYEK_VIRAMA>
graph <MEETEI_MAYEK_LETTER_KOK>;...;<MEETEI_MAYEK_APUN_IYEK>
digit <MEETEI_MAYEK_DIGIT_ZERO>;...;<MEETEI_MAYEK_DIGIT_NINE>
**********************************************************************
* 0xE000 - 0xF8FF Private Use Area (from pre-CLDR data)
**********************************************************************
graph <PRIVATE_USE_AREA-E000>;...;<PRIVATE_USE_AREA-F8FF>
**********************************************************************
* 0xFB50 - 0xFDFF Arabic Presentation Forms (differential)
**********************************************************************
@ -914,6 +920,17 @@ punct <SMALL_COMMA>;...;<SMALL_COMMERCIAL_AT>
blank <ZERO_WIDTH_NO-BREAK_SPACE>
**********************************************************************
* 0xFF00 - 0xFFFF Half- and Fullwidth Punctuation (from pre-CLDR data)
**********************************************************************
punct <FULLWIDTH_EXCLAMATION_MARK>;...;<FULLWIDTH_SOLIDUS>;/
<FULLWIDTH_COLON>;...;<FULLWIDTH_COMMERCIAL_AT>;/
<FULLWIDTH_LEFT_SQUARE_BRACKET>;...;<FULLWIDTH_GRAVE_ACCENT>;/
<FULLWIDTH_LEFT_CURLY_BRACKET>;...;<HALFWIDTH_KATAKANA_MIDDLE_DOT>;/
<FULLWIDTH_CENT_SIGN>;...;<FULLWIDTH_WON_SIGN>;/
<HALFWIDTH_FORMS_LIGHT_VERTICAL>;...;<HALFWIDTH_WHITE_CIRCLE>
**********************************************************************
* 0x10300 - 0x1032F Old Italic
**********************************************************************