79 lines
2.9 KiB
Groff
79 lines
2.9 KiB
Groff
.\" Copyright (c) 2002, 2003 Tim J. Robbins
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd August 10, 2003
|
|
.Dt GB18030 5
|
|
.Os
|
|
.Sh NAME
|
|
.Nm gb18030
|
|
.Nd "GB 18030 encoding method for Chinese text"
|
|
.Sh SYNOPSIS
|
|
.Nm ENCODING
|
|
.Qq GB18030
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm GB18030
|
|
encoding implements GB 18030-2000, a PRC national standard for the encoding of
|
|
Chinese characters.
|
|
It is a superset of the older GB\ 2312-1980 and GBK encodings,
|
|
and incorporates Unicode's Unihan Extension A completely.
|
|
It also provides code space for all Unicode 3.0 code points.
|
|
.Pp
|
|
Multibyte characters in the
|
|
.Nm GB18030
|
|
encoding can be one byte, two bytes, or
|
|
four bytes long.
|
|
There are a total of over 1.5 million code positions.
|
|
.Pp
|
|
.No GB\ 11383-1981 Pq Tn ASCII
|
|
characters are represented by single bytes in the range 0x00 to 0x7F.
|
|
.Pp
|
|
Chinese characters are represented as either two bytes or four bytes.
|
|
Characters that are represented by two bytes begin with a byte in the range
|
|
0x81-0xFE and end with a byte either in the range 0x40-0x7E or 0x80-0xFE.
|
|
.Pp
|
|
Characters that are represented by four bytes begin with a byte in the range
|
|
0x81-0xFE, have a second byte in the range 0x30-0x39, a third byte in the range
|
|
0x81-0xFE and a fourth byte in the range 0x30-0x39.
|
|
.Sh SEE ALSO
|
|
.Xr euc 5 ,
|
|
.Xr gb2312 5 ,
|
|
.Xr gbk 5 ,
|
|
.Xr utf8 5
|
|
.Rs
|
|
.%T "Chinese National Standard GB 18030-2000: Information Technology -- Chinese ideograms coded character set for information interchange -- Extension for the basic set"
|
|
.%D "March 2000"
|
|
.Re
|
|
.Rs
|
|
.%Q "The Unicode Consortium"
|
|
.%T "The Unicode Standard, Version 3.0"
|
|
.%D "2000"
|
|
.Re
|
|
.Sh STANDARDS
|
|
The
|
|
.Nm GB18030
|
|
encoding is believed to be compatible with GB 18030-2000.
|