2004-04-08 09:59:02 +00:00
|
|
|
.\" Copyright (c) 2002-2004 Tim J. Robbins. All rights reserved.
|
1994-05-27 05:00:24 +00:00
|
|
|
.\" Copyright (c) 1993
|
|
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
|
|
.\"
|
|
|
|
.\" This code is derived from software contributed to Berkeley by
|
|
|
|
.\" Donn Seeley of BSDI.
|
|
|
|
.\"
|
|
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
|
|
.\" modification, are permitted provided that the following conditions
|
|
|
|
.\" are met:
|
|
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
|
|
.\" 4. Neither the name of the University nor the names of its contributors
|
|
|
|
.\" may be used to endorse or promote products derived from this software
|
|
|
|
.\" without specific prior written permission.
|
|
|
|
.\"
|
|
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
.\" SUCH DAMAGE.
|
|
|
|
.\"
|
|
|
|
.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93
|
1999-08-28 00:22:10 +00:00
|
|
|
.\" $FreeBSD$
|
1994-05-27 05:00:24 +00:00
|
|
|
.\"
|
2004-04-08 09:59:02 +00:00
|
|
|
.Dd April 8, 2004
|
1994-05-27 05:00:24 +00:00
|
|
|
.Dt MULTIBYTE 3
|
|
|
|
.Os
|
|
|
|
.Sh NAME
|
2003-12-07 06:33:52 +00:00
|
|
|
.Nm multibyte
|
|
|
|
.Nd multibyte and wide character manipulation functions
|
2000-04-21 09:42:15 +00:00
|
|
|
.Sh LIBRARY
|
|
|
|
.Lb libc
|
1994-05-27 05:00:24 +00:00
|
|
|
.Sh SYNOPSIS
|
2003-12-07 06:33:52 +00:00
|
|
|
.In limits.h
|
2001-10-01 16:09:29 +00:00
|
|
|
.In stdlib.h
|
2003-12-07 06:33:52 +00:00
|
|
|
.In wchar.h
|
1994-05-27 05:00:24 +00:00
|
|
|
.Sh DESCRIPTION
|
2003-12-07 06:33:52 +00:00
|
|
|
The basic elements of some written natural languages, such as Chinese,
|
1994-05-27 05:00:24 +00:00
|
|
|
cannot be represented uniquely with single C
|
2004-06-30 20:09:10 +00:00
|
|
|
.Vt char Ns s .
|
1994-05-27 05:00:24 +00:00
|
|
|
The C standard supports two different ways of dealing with
|
2003-12-07 06:33:52 +00:00
|
|
|
extended natural language encodings:
|
|
|
|
wide characters and
|
|
|
|
multibyte characters.
|
1994-05-27 05:00:24 +00:00
|
|
|
Wide characters are an internal representation
|
|
|
|
which allows each basic element to map
|
|
|
|
to a single object of type
|
2004-06-30 20:09:10 +00:00
|
|
|
.Vt wchar_t .
|
1994-05-27 05:00:24 +00:00
|
|
|
Multibyte characters are used for input and output
|
|
|
|
and code each basic element as a sequence of C
|
2004-06-30 20:09:10 +00:00
|
|
|
.Vt char Ns s .
|
1994-05-27 05:00:24 +00:00
|
|
|
Individual basic elements may map into one or more
|
2001-08-07 15:48:51 +00:00
|
|
|
(up to
|
2002-09-01 07:21:58 +00:00
|
|
|
.Dv MB_LEN_MAX )
|
1994-05-27 05:00:24 +00:00
|
|
|
bytes in a multibyte character.
|
|
|
|
.Pp
|
|
|
|
The current locale
|
|
|
|
.Pq Xr setlocale 3
|
|
|
|
governs the interpretation of wide and multibyte characters.
|
|
|
|
The locale category
|
|
|
|
.Dv LC_CTYPE
|
|
|
|
specifically controls this interpretation.
|
|
|
|
The
|
2004-06-30 20:09:10 +00:00
|
|
|
.Vt wchar_t
|
1994-05-27 05:00:24 +00:00
|
|
|
type is wide enough to hold the largest value
|
|
|
|
in the wide character representations for all locales.
|
|
|
|
.Pp
|
|
|
|
Multibyte strings may contain
|
|
|
|
.Sq shift
|
|
|
|
indicators to switch to and from
|
|
|
|
particular modes within the given representation.
|
|
|
|
If explicit bytes are used to signal shifting,
|
|
|
|
these are not recognized as separate characters
|
|
|
|
but are lumped with a neighboring character.
|
|
|
|
There is always a distinguished
|
|
|
|
.Sq initial
|
|
|
|
shift state.
|
2004-06-30 20:09:10 +00:00
|
|
|
Some functions (e.g.,
|
|
|
|
.Xr mblen 3 ,
|
|
|
|
.Xr mbtowc 3
|
1994-05-27 05:00:24 +00:00
|
|
|
and
|
2004-06-30 20:09:10 +00:00
|
|
|
.Xr wctomb 3 )
|
2003-12-07 06:33:52 +00:00
|
|
|
maintain static shift state internally, whereas
|
2004-06-30 20:09:10 +00:00
|
|
|
others store it in an
|
2003-12-07 06:33:52 +00:00
|
|
|
.Vt mbstate_t
|
|
|
|
object passed by the caller.
|
|
|
|
Shift states are undefined after a call to
|
2004-06-30 20:09:10 +00:00
|
|
|
.Xr setlocale 3
|
1994-05-27 05:00:24 +00:00
|
|
|
with the
|
|
|
|
.Dv LC_CTYPE
|
|
|
|
or
|
|
|
|
.Dv LC_ALL
|
|
|
|
categories.
|
|
|
|
.Pp
|
|
|
|
For convenience in processing,
|
|
|
|
the wide character with value 0
|
2001-08-07 15:48:51 +00:00
|
|
|
(the null wide character)
|
1994-05-27 05:00:24 +00:00
|
|
|
is recognized as the wide character string terminator,
|
|
|
|
and the character with value 0
|
2001-08-07 15:48:51 +00:00
|
|
|
(the null byte)
|
1994-05-27 05:00:24 +00:00
|
|
|
is recognized as the multibyte character string terminator.
|
|
|
|
Null bytes are not permitted within multibyte characters.
|
|
|
|
.Pp
|
2003-12-07 06:33:52 +00:00
|
|
|
The C library provides the following functions for dealing with
|
|
|
|
multibyte characters:
|
|
|
|
.Bl -column "Description"
|
2004-06-30 20:09:10 +00:00
|
|
|
.It Sy "Function Description"
|
|
|
|
.It Xr mblen 3 Ta "get number of bytes in a character"
|
|
|
|
.It Xr mbrlen 3 Ta "get number of bytes in a character (restartable)"
|
|
|
|
.It Xr mbrtowc 3 Ta "convert a character to a wide-character code (restartable)"
|
|
|
|
.It Xr mbsrtowcs 3 Ta "convert a character string to a wide-character string (restartable)"
|
|
|
|
.It Xr mbstowcs 3 Ta "convert a character string to a wide-character string"
|
|
|
|
.It Xr mbtowc 3 Ta "convert a character to a wide-character code"
|
|
|
|
.It Xr wcrtomb 3 Ta "convert a wide-character code to a character (restartable)"
|
|
|
|
.It Xr wcstombs 3 Ta "convert a wide-character string to a character string"
|
|
|
|
.It Xr wcsrtombs 3 Ta "convert a wide-character string to a character string (restartable)"
|
|
|
|
.It Xr wctomb 3 Ta "convert a wide-character code to a character"
|
2003-12-07 06:33:52 +00:00
|
|
|
.El
|
2003-02-06 11:04:47 +00:00
|
|
|
.Sh SEE ALSO
|
2003-12-07 06:33:52 +00:00
|
|
|
.Xr mklocale 1 ,
|
1994-05-27 05:00:24 +00:00
|
|
|
.Xr setlocale 3 ,
|
2004-06-30 20:09:10 +00:00
|
|
|
.Xr stdio 3 ,
|
2003-08-10 09:25:52 +00:00
|
|
|
.Xr big5 5 ,
|
2003-11-15 02:26:04 +00:00
|
|
|
.Xr euc 5 ,
|
2003-08-10 09:25:52 +00:00
|
|
|
.Xr gb18030 5 ,
|
2003-11-08 03:23:11 +00:00
|
|
|
.Xr gb2312 5 ,
|
2003-08-10 11:38:28 +00:00
|
|
|
.Xr gbk 5 ,
|
2003-08-10 09:25:52 +00:00
|
|
|
.Xr mskanji 5 ,
|
2002-10-10 22:56:18 +00:00
|
|
|
.Xr utf8 5
|
1994-05-27 05:00:24 +00:00
|
|
|
.Sh STANDARDS
|
2003-12-07 06:33:52 +00:00
|
|
|
These functions conform to
|
2004-04-08 09:59:02 +00:00
|
|
|
.St -isoC-99 .
|