From 8ca5fa518c53fb09f1426de979e3b9f6f226b035 Mon Sep 17 00:00:00 2001 From: "Tim J. Robbins" Date: Sun, 10 Aug 2003 09:23:51 +0000 Subject: [PATCH] Add manual pages for the BIG5, GB18030 and MSKanji encodings. These may need to be fleshed out a little, especially big5(5). --- lib/libc/locale/Makefile.inc | 2 +- lib/libc/locale/big5.5 | 48 ++++++++++++++++++++++++ lib/libc/locale/gb18030.5 | 71 ++++++++++++++++++++++++++++++++++++ lib/libc/locale/mskanji.5 | 69 +++++++++++++++++++++++++++++++++++ 4 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 lib/libc/locale/big5.5 create mode 100644 lib/libc/locale/gb18030.5 create mode 100644 lib/libc/locale/mskanji.5 diff --git a/lib/libc/locale/Makefile.inc b/lib/libc/locale/Makefile.inc index 5d00d19a56f4..feab8b65a9bb 100644 --- a/lib/libc/locale/Makefile.inc +++ b/lib/libc/locale/Makefile.inc @@ -31,7 +31,7 @@ MAN+= btowc.3 \ wcsrtombs.3 wcstod.3 wcstol.3 \ wctrans.3 wctype.3 wcwidth.3 MAN+= euc.4 utf2.4 -MAN+= utf8.5 +MAN+= big5.5 gb18030.5 mskanji.5 utf8.5 MLINKS+=btowc.3 wctob.3 MLINKS+=isdigit.3 isnumber.3 diff --git a/lib/libc/locale/big5.5 b/lib/libc/locale/big5.5 new file mode 100644 index 000000000000..77fe06ab3e3e --- /dev/null +++ b/lib/libc/locale/big5.5 @@ -0,0 +1,48 @@ +.\" Copyright (c) 2002, 2003 Tim J. Robbins +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.Dd August 7, 2003 +.Dt BIG5 5 +.Os +.Sh NAME +.Nm big5 +.Nd "``Big Five'' encoding for Traditional Chinese text" +.Sh SYNOPSIS +.Nm ENCODING +.Qq BIG5 +.Sh DESCRIPTION +.Dq Big Five +is the de facto standard for encoding Traditional Chinese text. +Each character is represented by either one or two bytes. +Characters from the +.Tn ASCII +character set are represented as single bytes in the range 0x00 - 0x7F. +Traditional Chinese characters are represented by two bytes: +the first in the range 0xA1 - 0xFE, the second in the range +0x40 - 0xFE. +.Sh SEE ALSO +.Xr euc 4 , +.Xr gb18030 5 , +.Xr utf8 5 diff --git a/lib/libc/locale/gb18030.5 b/lib/libc/locale/gb18030.5 new file mode 100644 index 000000000000..acffd41fe7c1 --- /dev/null +++ b/lib/libc/locale/gb18030.5 @@ -0,0 +1,71 @@ +.\" Copyright (c) 2002, 2003 Tim J. Robbins +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.Dd August 8, 2003 +.Dt GB18030 5 +.Os +.Sh NAME +.Nm gb18030 +.Nd "GB 18030 encoding method for Chinese text" +.Sh SYNOPSIS +.Nm ENCODING +.Qq GB18030 +.Sh DESCRIPTION +The +.Nm GB18030 +encoding implements GB 18030-2000, a PRC national standard for the encoding of +Chinese characters. +It is a superset of the older GB\ 2312-1980 and GBK encodings, +and incorporates Unicode's Unihan Extension A completely. +It also provides code space for all Unicode 3.0 code points. +.Pp +Multibyte characters in the +.Nm GB18030 +encoding can be one byte, two bytes, or +four bytes long. +There are a total of over 1.5 million code positions. +.Pp +.No GB\ 18030-1981 ( Ns +.Tn ASCII ) +characters are represented by single bytes in the range 0x00 to 0x7F. +.Pp +Chinese characters are represented as either two bytes or four bytes. +Characters that are represented by two bytes begin with a byte in the range +0x81-0xFE and end with a byte either in the range 0x40-0x7E or 0x80-0xFE. +.Pp +Characters that are represented by four bytes begin with a byte in the range +0x81-0xFE, have a second byte in the range 0x30-0x39, a third byte in the range +0x81-0xFE and a fourth byte in the range 0x30-0x39. +.Sh SEE ALSO +.Xr euc 4 , +.Xr utf8 5 +.Rs +.%T "Chinese National Standard GB 18030-2000: Information Technology -- Chinese ideograms coded character set for information interchange -- Extension for the basic set" +.%D "March 2000" +.Re +.Sh STANDARDS +The +.Nm GB18030 +encoding is believed to be compatible with GB 18030-2000. diff --git a/lib/libc/locale/mskanji.5 b/lib/libc/locale/mskanji.5 new file mode 100644 index 000000000000..663ec6714b95 --- /dev/null +++ b/lib/libc/locale/mskanji.5 @@ -0,0 +1,69 @@ +.\" Copyright (c) 2002, 2003 Tim J. Robbins +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.Dd August 7, 2003 +.Dt MSKANJI 5 +.Os +.Sh NAME +.Nm mskanji +.Nd "Shift-JIS (MS Kanji) encoding for Japanese text" +.Sh SYNOPSIS +.Nm ENCODING +.Qq MSKANJI +.Sh DESCRIPTION +Shift-JIS, also known as MS Kanji or SJIS, is an encoding system for +Japanese characters, developed by Microsoft Corporation. +It encodes the characters from the +.Tn JIS +X 0201 (ASCII/JIS-Roman) and +.Tn JIS +X 0208 (Japanese) character sets as sequences of either one or two bytes. +.Pp +Characters from the +.Tn ASCII Ns +/JIS-Roman character set are encoded as single bytes between 0x00 and 0x7F +(ASCII) or 0xA1 and 0xDF (Half-width katakana). +.Pp +Characters from the +.Tn JIS +X 0208 character set are encoded as two bytes. +The first ranges from +0x81 - 0x9F, 0xE0 - 0xEA, 0xED - 0xEE (not +.Tn JIS Ns : +.Tn NEC Ns - Ns +selected +.Tn IBM +extended characters), +0xF0 - 0xF9 (not +.Tn JIS Ns : +user defined), +or 0xFA - 0xFC (not +.Tn JIS Ns : +.Tn IBM +extended characters). +The second byte ranges from 0x40 - 0xFC, excluding 0x7F (delete). +.Sh SEE ALSO +.Xr euc 4 , +.Xr utf8 5