1994-05-27 05:00:24 +00:00
|
|
|
.\" Copyright (c) 1993
|
|
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
|
|
.\"
|
|
|
|
.\" This code is derived from software contributed to Berkeley by
|
|
|
|
.\" Paul Borman at Krystal Technologies.
|
|
|
|
.\"
|
|
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
|
|
.\" modification, are permitted provided that the following conditions
|
|
|
|
.\" are met:
|
|
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
|
|
.\" must display the following acknowledgement:
|
|
|
|
.\" This product includes software developed by the University of
|
|
|
|
.\" California, Berkeley and its contributors.
|
|
|
|
.\" 4. Neither the name of the University nor the names of its contributors
|
|
|
|
.\" may be used to endorse or promote products derived from this software
|
|
|
|
.\" without specific prior written permission.
|
|
|
|
.\"
|
|
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
.\" SUCH DAMAGE.
|
|
|
|
.\"
|
|
|
|
.\" @(#)utf2.4 8.1 (Berkeley) 6/4/93
|
1999-08-28 00:22:10 +00:00
|
|
|
.\" $FreeBSD$
|
1994-05-27 05:00:24 +00:00
|
|
|
.\"
|
1997-06-23 04:03:49 +00:00
|
|
|
.Dd June 4, 1993
|
1994-05-27 05:00:24 +00:00
|
|
|
.Dt UTF2 4
|
|
|
|
.Os
|
|
|
|
.Sh NAME
|
1996-02-09 17:25:57 +00:00
|
|
|
.Nm utf2
|
1994-05-27 05:00:24 +00:00
|
|
|
.Nd "Universal character set Transformation Format encoding of runes
|
|
|
|
.Sh SYNOPSIS
|
1997-01-31 01:00:12 +00:00
|
|
|
.Nm ENCODING
|
|
|
|
.Qq UTF2
|
1994-05-27 05:00:24 +00:00
|
|
|
.Sh DESCRIPTION
|
|
|
|
The
|
|
|
|
.Nm UTF2
|
|
|
|
encoding is based on a proposed X-Open multibyte
|
|
|
|
\s-1FSS-UCS-TF\s+1 (File System Safe Universal Character Set Transformation Format) encoding as used in
|
|
|
|
.Nm Plan 9 from Bell Labs.
|
|
|
|
Although it is capable of representing more than 16 bits,
|
|
|
|
the current implementation is limited to 16 bits as defined by the
|
|
|
|
Unicode Standard.
|
|
|
|
.Pp
|
|
|
|
.Nm UTF2
|
|
|
|
representation is backwards compatible with ASCII, so 0x00-0x7f refer to the
|
|
|
|
ASCII character set. The multibyte encoding of runes between 0x0080 and 0xffff
|
|
|
|
consist entirely of bytes whose high order bit is set. The actual
|
|
|
|
encoding is represented by the following table:
|
|
|
|
.Bd -literal
|
|
|
|
[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb
|
Correct description of which runes are encoded as two bytes.
PR: 4555
Submitted by: Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
[0x0400 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
.Ed
.Pp
If more than a single representation of a value exists (for example,
0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
used (but the longer ones will be correctly decoded).
.Pp
The final three encodings provided by X-Open:
.Bd -literal
[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
.Ed
.Pp
which provides for the entire proposed ISO-10646 31 bit standard are currently
not implemented.
.Sh "SEE ALSO"
.Xr mklocale 1 ,
.Xr setlocale 3
@
1.4
log
@Don't use hardcoded *roff font change requests. Do it
via mdoc macros instead.
@
text
@d37 1
a37 1
.Dd "June 4, 1993"
@
1.3
log
@Very minor mdoc cleanup.
@
text
@d44 2
a45 1
\fBENCODING "UTF2"\fP
@
1.2
log
@Another round of various man page cleanups.
@
text
@d65 1
a65 1
.sp
d81 1
a81 1
.sp
@
1.2.2.1
log
@YAMFC:
Commit all of the -current changes that apply to 2.2. These fall into
several categories:
- Cosmetic/mdoc changes. They don't really afect the output
at all, but having them in 2.2 will make it easier to diff the man
pages later when looking for real changes.
- Update some man pages to reflect the current 2.2 header files.
- Sort xrefs.
- A few typo fixes.
- And a few changes that actualy added text to the man page that should
be reflected in 2.2.
- Add some missing MLINKS.
Requested by: bde
@
text
@d44 1
a44 2
.Nm ENCODING
.Qq UTF2
d65 1
a65 1
.Pp
d81 1
a81 1
.Pp
@
1.2.2.2
log
@MFC: Just the locale fixes (small doc tweaks for the most part)
and the new strptime(3) call. Having added something, does this
require a version bump? Haven't we bumped once already?
There are a *LOT* of additional 3.0 changes to be merged but I'm not
entirely comfortable with some of them so I'll take the conservative
(read: cowardly :) way out and just merge this much.
@
text
@d37 1
a37 1
.Dd June 4, 1993
@
1.1
log
@Initial revision
@
text
@d41 1
a41 1
.Nm UTF2
@
1.1.1.1
log
@BSD 4.4 Lite Lib Sources
@
text
@@
1.1.1.1.6.1
log
@Phase 2 of merge - also fix things broken in phase 1.
Watch out for falling rock until phase 3 is over!
libc completely merged except for phkmalloc & rfork (don't know if David
wants that).
Some include files in sys/ had to be updated in order to bring in libc.
@
text
@d41 1
a41 1
.Nm utf2
@
1.1.1.1.6.2
log
@This 3rd mega-commit should hopefully bring us back to where we were.
I can get it to `make world' succesfully, anyway!
@
text
@d41 1
a41 1
.Nm UTF2
@
1997-11-05 04:18:42 +00:00
|
|
|
[0x0080 - 0x07ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
|
|
|
|
[0x0800 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
|
1994-05-27 05:00:24 +00:00
|
|
|
.Ed
|
1997-01-31 00:25:12 +00:00
|
|
|
.Pp
|
1994-05-27 05:00:24 +00:00
|
|
|
If more than a single representation of a value exists (for example,
|
|
|
|
0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
|
|
|
|
used (but the longer ones will be correctly decoded).
|
|
|
|
.Pp
|
|
|
|
The final three encodings provided by X-Open:
|
|
|
|
.Bd -literal
|
|
|
|
[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
|
|
|
|
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
|
|
|
|
|
|
|
|
[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
|
|
|
|
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
|
|
|
|
|
|
|
|
[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
|
|
|
|
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
|
|
|
|
.Ed
|
1997-01-31 00:25:12 +00:00
|
|
|
.Pp
|
1994-05-27 05:00:24 +00:00
|
|
|
which provides for the entire proposed ISO-10646 31 bit standard are currently
|
|
|
|
not implemented.
|
|
|
|
.Sh "SEE ALSO"
|
|
|
|
.Xr mklocale 1 ,
|
|
|
|
.Xr setlocale 3
|