1236 lines
51 KiB
Plaintext
1236 lines
51 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Network Working Group P. Faltstrom
|
|||
|
Request for Comments: 3490 Cisco
|
|||
|
Category: Standards Track P. Hoffman
|
|||
|
IMC & VPNC
|
|||
|
A. Costello
|
|||
|
UC Berkeley
|
|||
|
March 2003
|
|||
|
|
|||
|
|
|||
|
Internationalizing Domain Names in Applications (IDNA)
|
|||
|
|
|||
|
Status of this Memo
|
|||
|
|
|||
|
This document specifies an Internet standards track protocol for the
|
|||
|
Internet community, and requests discussion and suggestions for
|
|||
|
improvements. Please refer to the current edition of the "Internet
|
|||
|
Official Protocol Standards" (STD 1) for the standardization state
|
|||
|
and status of this protocol. Distribution of this memo is unlimited.
|
|||
|
|
|||
|
Copyright Notice
|
|||
|
|
|||
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
Until now, there has been no standard method for domain names to use
|
|||
|
characters outside the ASCII repertoire. This document defines
|
|||
|
internationalized domain names (IDNs) and a mechanism called
|
|||
|
Internationalizing Domain Names in Applications (IDNA) for handling
|
|||
|
them in a standard fashion. IDNs use characters drawn from a large
|
|||
|
repertoire (Unicode), but IDNA allows the non-ASCII characters to be
|
|||
|
represented using only the ASCII characters already allowed in so-
|
|||
|
called host names today. This backward-compatible representation is
|
|||
|
required in existing protocols like DNS, so that IDNs can be
|
|||
|
introduced with no changes to the existing infrastructure. IDNA is
|
|||
|
only meant for processing domain names, not free text.
|
|||
|
|
|||
|
Table of Contents
|
|||
|
|
|||
|
1. Introduction.................................................. 2
|
|||
|
1.1 Problem Statement......................................... 3
|
|||
|
1.2 Limitations of IDNA....................................... 3
|
|||
|
1.3 Brief overview for application developers................. 4
|
|||
|
2. Terminology................................................... 5
|
|||
|
3. Requirements and applicability................................ 7
|
|||
|
3.1 Requirements.............................................. 7
|
|||
|
3.2 Applicability............................................. 8
|
|||
|
3.2.1. DNS resource records................................ 8
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 1]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
3.2.2. Non-domain-name data types stored in domain names... 9
|
|||
|
4. Conversion operations......................................... 9
|
|||
|
4.1 ToASCII................................................... 10
|
|||
|
4.2 ToUnicode................................................. 11
|
|||
|
5. ACE prefix.................................................... 12
|
|||
|
6. Implications for typical applications using DNS............... 13
|
|||
|
6.1 Entry and display in applications......................... 14
|
|||
|
6.2 Applications and resolver libraries....................... 15
|
|||
|
6.3 DNS servers............................................... 15
|
|||
|
6.4 Avoiding exposing users to the raw ACE encoding........... 16
|
|||
|
6.5 DNSSEC authentication of IDN domain names................ 16
|
|||
|
7. Name server considerations.................................... 17
|
|||
|
8. Root server considerations.................................... 17
|
|||
|
9. References.................................................... 18
|
|||
|
9.1 Normative References...................................... 18
|
|||
|
9.2 Informative References.................................... 18
|
|||
|
10. Security Considerations...................................... 19
|
|||
|
11. IANA Considerations.......................................... 20
|
|||
|
12. Authors' Addresses........................................... 21
|
|||
|
13. Full Copyright Statement..................................... 22
|
|||
|
|
|||
|
1. Introduction
|
|||
|
|
|||
|
IDNA works by allowing applications to use certain ASCII name labels
|
|||
|
(beginning with a special prefix) to represent non-ASCII name labels.
|
|||
|
Lower-layer protocols need not be aware of this; therefore IDNA does
|
|||
|
not depend on changes to any infrastructure. In particular, IDNA
|
|||
|
does not depend on any changes to DNS servers, resolvers, or protocol
|
|||
|
elements, because the ASCII name service provided by the existing DNS
|
|||
|
is entirely sufficient for IDNA.
|
|||
|
|
|||
|
This document does not require any applications to conform to IDNA,
|
|||
|
but applications can elect to use IDNA in order to support IDN while
|
|||
|
maintaining interoperability with existing infrastructure. If an
|
|||
|
application wants to use non-ASCII characters in domain names, IDNA
|
|||
|
is the only currently-defined option. Adding IDNA support to an
|
|||
|
existing application entails changes to the application only, and
|
|||
|
leaves room for flexibility in the user interface.
|
|||
|
|
|||
|
A great deal of the discussion of IDN solutions has focused on
|
|||
|
transition issues and how IDN will work in a world where not all of
|
|||
|
the components have been updated. Proposals that were not chosen by
|
|||
|
the IDN Working Group would depend on user applications, resolvers,
|
|||
|
and DNS servers being updated in order for a user to use an
|
|||
|
internationalized domain name. Rather than rely on widespread
|
|||
|
updating of all components, IDNA depends on updates to user
|
|||
|
applications only; no changes are needed to the DNS protocol or any
|
|||
|
DNS servers or the resolvers on user's computers.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 2]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
1.1 Problem Statement
|
|||
|
|
|||
|
The IDNA specification solves the problem of extending the repertoire
|
|||
|
of characters that can be used in domain names to include the Unicode
|
|||
|
repertoire (with some restrictions).
|
|||
|
|
|||
|
IDNA does not extend the service offered by DNS to the applications.
|
|||
|
Instead, the applications (and, by implication, the users) continue
|
|||
|
to see an exact-match lookup service. Either there is a single
|
|||
|
exactly-matching name or there is no match. This model has served
|
|||
|
the existing applications well, but it requires, with or without
|
|||
|
internationalized domain names, that users know the exact spelling of
|
|||
|
the domain names that the users type into applications such as web
|
|||
|
browsers and mail user agents. The introduction of the larger
|
|||
|
repertoire of characters potentially makes the set of misspellings
|
|||
|
larger, especially given that in some cases the same appearance, for
|
|||
|
example on a business card, might visually match several Unicode code
|
|||
|
points or several sequences of code points.
|
|||
|
|
|||
|
IDNA allows the graceful introduction of IDNs not only by avoiding
|
|||
|
upgrades to existing infrastructure (such as DNS servers and mail
|
|||
|
transport agents), but also by allowing some rudimentary use of IDNs
|
|||
|
in applications by using the ASCII representation of the non-ASCII
|
|||
|
name labels. While such names are very user-unfriendly to read and
|
|||
|
type, and hence are not suitable for user input, they allow (for
|
|||
|
instance) replying to email and clicking on URLs even though the
|
|||
|
domain name displayed is incomprehensible to the user. In order to
|
|||
|
allow user-friendly input and output of the IDNs, the applications
|
|||
|
need to be modified to conform to this specification.
|
|||
|
|
|||
|
IDNA uses the Unicode character repertoire, which avoids the
|
|||
|
significant delays that would be inherent in waiting for a different
|
|||
|
and specific character set be defined for IDN purposes by some other
|
|||
|
standards developing organization.
|
|||
|
|
|||
|
1.2 Limitations of IDNA
|
|||
|
|
|||
|
The IDNA protocol does not solve all linguistic issues with users
|
|||
|
inputting names in different scripts. Many important language-based
|
|||
|
and script-based mappings are not covered in IDNA and need to be
|
|||
|
handled outside the protocol. For example, names that are entered in
|
|||
|
a mix of traditional and simplified Chinese characters will not be
|
|||
|
mapped to a single canonical name. Another example is Scandinavian
|
|||
|
names that are entered with U+00F6 (LATIN SMALL LETTER O WITH
|
|||
|
DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH
|
|||
|
STROKE).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 3]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
An example of an important issue that is not considered in detail in
|
|||
|
IDNA is how to provide a high probability that a user who is entering
|
|||
|
a domain name based on visual information (such as from a business
|
|||
|
card or billboard) or aural information (such as from a telephone or
|
|||
|
radio) would correctly enter the IDN. Similar issues exist for ASCII
|
|||
|
domain names, for example the possible visual confusion between the
|
|||
|
letter 'O' and the digit zero, but the introduction of the larger
|
|||
|
repertoire of characters creates more opportunities of similar
|
|||
|
looking and similar sounding names. Note that this is a complex
|
|||
|
issue relating to languages, input methods on computers, and so on.
|
|||
|
Furthermore, the kind of matching and searching necessary for a high
|
|||
|
probability of success would not fit the role of the DNS and its
|
|||
|
exact matching function.
|
|||
|
|
|||
|
1.3 Brief overview for application developers
|
|||
|
|
|||
|
Applications can use IDNA to support internationalized domain names
|
|||
|
anywhere that ASCII domain names are already supported, including DNS
|
|||
|
master files and resolver interfaces. (Applications can also define
|
|||
|
protocols and interfaces that support IDNs directly using non-ASCII
|
|||
|
representations. IDNA does not prescribe any particular
|
|||
|
representation for new protocols, but it still defines which names
|
|||
|
are valid and how they are compared.)
|
|||
|
|
|||
|
The IDNA protocol is contained completely within applications. It is
|
|||
|
not a client-server or peer-to-peer protocol: everything is done
|
|||
|
inside the application itself. When used with a DNS resolver
|
|||
|
library, IDNA is inserted as a "shim" between the application and the
|
|||
|
resolver library. When used for writing names into a DNS zone, IDNA
|
|||
|
is used just before the name is committed to the zone.
|
|||
|
|
|||
|
There are two operations described in section 4 of this document:
|
|||
|
|
|||
|
- The ToASCII operation is used before sending an IDN to something
|
|||
|
that expects ASCII names (such as a resolver) or writing an IDN
|
|||
|
into a place that expects ASCII names (such as a DNS master file).
|
|||
|
|
|||
|
- The ToUnicode operation is used when displaying names to users,
|
|||
|
for example names obtained from a DNS zone.
|
|||
|
|
|||
|
It is important to note that the ToASCII operation can fail. If it
|
|||
|
fails when processing a domain name, that domain name cannot be used
|
|||
|
as an internationalized domain name and the application has to have
|
|||
|
some method of dealing with this failure.
|
|||
|
|
|||
|
IDNA requires that implementations process input strings with
|
|||
|
Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP],
|
|||
|
and then with Punycode [PUNYCODE]. Implementations of IDNA MUST
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 4]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
fully implement Nameprep and Punycode; neither Nameprep nor Punycode
|
|||
|
are optional.
|
|||
|
|
|||
|
2. Terminology
|
|||
|
|
|||
|
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
|
|||
|
and "MAY" in this document are to be interpreted as described in BCP
|
|||
|
14, RFC 2119 [RFC2119].
|
|||
|
|
|||
|
A code point is an integer value associated with a character in a
|
|||
|
coded character set.
|
|||
|
|
|||
|
Unicode [UNICODE] is a coded character set containing tens of
|
|||
|
thousands of characters. A single Unicode code point is denoted by
|
|||
|
"U+" followed by four to six hexadecimal digits, while a range of
|
|||
|
Unicode code points is denoted by two hexadecimal numbers separated
|
|||
|
by "..", with no prefixes.
|
|||
|
|
|||
|
ASCII means US-ASCII [USASCII], a coded character set containing 128
|
|||
|
characters associated with code points in the range 0..7F. Unicode
|
|||
|
is an extension of ASCII: it includes all the ASCII characters and
|
|||
|
associates them with the same code points.
|
|||
|
|
|||
|
The term "LDH code points" is defined in this document to mean the
|
|||
|
code points associated with ASCII letters, digits, and the hyphen-
|
|||
|
minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an
|
|||
|
abbreviation for "letters, digits, hyphen".
|
|||
|
|
|||
|
[STD13] talks about "domain names" and "host names", but many people
|
|||
|
use the terms interchangeably. Further, because [STD13] was not
|
|||
|
terribly clear, many people who are sure they know the exact
|
|||
|
definitions of each of these terms disagree on the definitions. In
|
|||
|
this document the term "domain name" is used in general. This
|
|||
|
document explicitly cites [STD3] whenever referring to the host name
|
|||
|
syntax restrictions defined therein.
|
|||
|
|
|||
|
A label is an individual part of a domain name. Labels are usually
|
|||
|
shown separated by dots; for example, the domain name
|
|||
|
"www.example.com" is composed of three labels: "www", "example", and
|
|||
|
"com". (The zero-length root label described in [STD13], which can
|
|||
|
be explicit as in "www.example.com." or implicit as in
|
|||
|
"www.example.com", is not considered a label in this specification.)
|
|||
|
IDNA extends the set of usable characters in labels that are text.
|
|||
|
For the rest of this document, the term "label" is shorthand for
|
|||
|
"text label", and "every label" means "every text label".
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 5]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
An "internationalized label" is a label to which the ToASCII
|
|||
|
operation (see section 4) can be applied without failing (with the
|
|||
|
UseSTD3ASCIIRules flag unset). This implies that every ASCII label
|
|||
|
that satisfies the [STD13] length restriction is an internationalized
|
|||
|
label. Therefore the term "internationalized label" is a
|
|||
|
generalization, embracing both old ASCII labels and new non-ASCII
|
|||
|
labels. Although most Unicode characters can appear in
|
|||
|
internationalized labels, ToASCII will fail for some input strings,
|
|||
|
and such strings are not valid internationalized labels.
|
|||
|
|
|||
|
An "internationalized domain name" (IDN) is a domain name in which
|
|||
|
every label is an internationalized label. This implies that every
|
|||
|
ASCII domain name is an IDN (which implies that it is possible for a
|
|||
|
name to be an IDN without it containing any non-ASCII characters).
|
|||
|
This document does not attempt to define an "internationalized host
|
|||
|
name". Just as has been the case with ASCII names, some DNS zone
|
|||
|
administrators may impose restrictions, beyond those imposed by DNS
|
|||
|
or IDNA, on the characters or strings that may be registered as
|
|||
|
labels in their zones. Such restrictions have no impact on the
|
|||
|
syntax or semantics of DNS protocol messages; a query for a name that
|
|||
|
matches no records will yield the same response regardless of the
|
|||
|
reason why it is not in the zone. Clients issuing queries or
|
|||
|
interpreting responses cannot be assumed to have any knowledge of
|
|||
|
zone-specific restrictions or conventions.
|
|||
|
|
|||
|
In IDNA, equivalence of labels is defined in terms of the ToASCII
|
|||
|
operation, which constructs an ASCII form for a given label, whether
|
|||
|
or not the label was already an ASCII label. Labels are defined to
|
|||
|
be equivalent if and only if their ASCII forms produced by ToASCII
|
|||
|
match using a case-insensitive ASCII comparison. ASCII labels
|
|||
|
already have a notion of equivalence: upper case and lower case are
|
|||
|
considered equivalent. The IDNA notion of equivalence is an
|
|||
|
extension of that older notion. Equivalent labels in IDNA are
|
|||
|
treated as alternate forms of the same label, just as "foo" and "Foo"
|
|||
|
are treated as alternate forms of the same label.
|
|||
|
|
|||
|
To allow internationalized labels to be handled by existing
|
|||
|
applications, IDNA uses an "ACE label" (ACE stands for ASCII
|
|||
|
Compatible Encoding). An ACE label is an internationalized label
|
|||
|
that can be rendered in ASCII and is equivalent to an
|
|||
|
internationalized label that cannot be rendered in ASCII. Given any
|
|||
|
internationalized label that cannot be rendered in ASCII, the ToASCII
|
|||
|
operation will convert it to an equivalent ACE label (whereas an
|
|||
|
ASCII label will be left unaltered by ToASCII). ACE labels are
|
|||
|
unsuitable for display to users. The ToUnicode operation will
|
|||
|
convert any label to an equivalent non-ACE label. In fact, an ACE
|
|||
|
label is formally defined to be any label that the ToUnicode
|
|||
|
operation would alter (whereas non-ACE labels are left unaltered by
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 6]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
ToUnicode). Every ACE label begins with the ACE prefix specified in
|
|||
|
section 5. The ToASCII and ToUnicode operations are specified in
|
|||
|
section 4.
|
|||
|
|
|||
|
The "ACE prefix" is defined in this document to be a string of ASCII
|
|||
|
characters that appears at the beginning of every ACE label. It is
|
|||
|
specified in section 5.
|
|||
|
|
|||
|
A "domain name slot" is defined in this document to be a protocol
|
|||
|
element or a function argument or a return value (and so on)
|
|||
|
explicitly designated for carrying a domain name. Examples of domain
|
|||
|
name slots include: the QNAME field of a DNS query; the name argument
|
|||
|
of the gethostbyname() library function; the part of an email address
|
|||
|
following the at-sign (@) in the From: field of an email message
|
|||
|
header; and the host portion of the URI in the src attribute of an
|
|||
|
HTML <IMG> tag. General text that just happens to contain a domain
|
|||
|
name is not a domain name slot; for example, a domain name appearing
|
|||
|
in the plain text body of an email message is not occupying a domain
|
|||
|
name slot.
|
|||
|
|
|||
|
An "IDN-aware domain name slot" is defined in this document to be a
|
|||
|
domain name slot explicitly designated for carrying an
|
|||
|
internationalized domain name as defined in this document. The
|
|||
|
designation may be static (for example, in the specification of the
|
|||
|
protocol or interface) or dynamic (for example, as a result of
|
|||
|
negotiation in an interactive session).
|
|||
|
|
|||
|
An "IDN-unaware domain name slot" is defined in this document to be
|
|||
|
any domain name slot that is not an IDN-aware domain name slot.
|
|||
|
Obviously, this includes any domain name slot whose specification
|
|||
|
predates IDNA.
|
|||
|
|
|||
|
3. Requirements and applicability
|
|||
|
|
|||
|
3.1 Requirements
|
|||
|
|
|||
|
IDNA conformance means adherence to the following four requirements:
|
|||
|
|
|||
|
1) Whenever dots are used as label separators, the following
|
|||
|
characters MUST be recognized as dots: U+002E (full stop), U+3002
|
|||
|
(ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
|
|||
|
(halfwidth ideographic full stop).
|
|||
|
|
|||
|
2) Whenever a domain name is put into an IDN-unaware domain name slot
|
|||
|
(see section 2), it MUST contain only ASCII characters. Given an
|
|||
|
internationalized domain name (IDN), an equivalent domain name
|
|||
|
satisfying this requirement can be obtained by applying the
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 7]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
ToASCII operation (see section 4) to each label and, if dots are
|
|||
|
used as label separators, changing all the label separators to
|
|||
|
U+002E.
|
|||
|
|
|||
|
3) ACE labels obtained from domain name slots SHOULD be hidden from
|
|||
|
users when it is known that the environment can handle the non-ACE
|
|||
|
form, except when the ACE form is explicitly requested. When it
|
|||
|
is not known whether or not the environment can handle the non-ACE
|
|||
|
form, the application MAY use the non-ACE form (which might fail,
|
|||
|
such as by not being displayed properly), or it MAY use the ACE
|
|||
|
form (which will look unintelligle to the user). Given an
|
|||
|
internationalized domain name, an equivalent domain name
|
|||
|
containing no ACE labels can be obtained by applying the ToUnicode
|
|||
|
operation (see section 4) to each label. When requirements 2 and
|
|||
|
3 both apply, requirement 2 takes precedence.
|
|||
|
|
|||
|
4) Whenever two labels are compared, they MUST be considered to match
|
|||
|
if and only if they are equivalent, that is, their ASCII forms
|
|||
|
(obtained by applying ToASCII) match using a case-insensitive
|
|||
|
ASCII comparison. Whenever two names are compared, they MUST be
|
|||
|
considered to match if and only if their corresponding labels
|
|||
|
match, regardless of whether the names use the same forms of label
|
|||
|
separators.
|
|||
|
|
|||
|
3.2 Applicability
|
|||
|
|
|||
|
IDNA is applicable to all domain names in all domain name slots
|
|||
|
except where it is explicitly excluded.
|
|||
|
|
|||
|
This implies that IDNA is applicable to many protocols that predate
|
|||
|
IDNA. Note that IDNs occupying domain name slots in those protocols
|
|||
|
MUST be in ASCII form (see section 3.1, requirement 2).
|
|||
|
|
|||
|
3.2.1. DNS resource records
|
|||
|
|
|||
|
IDNA does not apply to domain names in the NAME and RDATA fields of
|
|||
|
DNS resource records whose CLASS is not IN. This exclusion applies
|
|||
|
to every non-IN class, present and future, except where future
|
|||
|
standards override this exclusion by explicitly inviting the use of
|
|||
|
IDNA.
|
|||
|
|
|||
|
There are currently no other exclusions on the applicability of IDNA
|
|||
|
to DNS resource records; it depends entirely on the CLASS, and not on
|
|||
|
the TYPE. This will remain true, even as new types are defined,
|
|||
|
unless there is a compelling reason for a new type to complicate
|
|||
|
matters by imposing type-specific rules.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 8]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
3.2.2. Non-domain-name data types stored in domain names
|
|||
|
|
|||
|
Although IDNA enables the representation of non-ASCII characters in
|
|||
|
domain names, that does not imply that IDNA enables the
|
|||
|
representation of non-ASCII characters in other data types that are
|
|||
|
stored in domain names. For example, an email address local part is
|
|||
|
sometimes stored in a domain label (hostmaster@example.com would be
|
|||
|
represented as hostmaster.example.com in the RDATA field of an SOA
|
|||
|
record). IDNA does not update the existing email standards, which
|
|||
|
allow only ASCII characters in local parts. Therefore, unless the
|
|||
|
email standards are revised to invite the use of IDNA for local
|
|||
|
parts, a domain label that holds the local part of an email address
|
|||
|
SHOULD NOT begin with the ACE prefix, and even if it does, it is to
|
|||
|
be interpreted literally as a local part that happens to begin with
|
|||
|
the ACE prefix.
|
|||
|
|
|||
|
4. Conversion operations
|
|||
|
|
|||
|
An application converts a domain name put into an IDN-unaware slot or
|
|||
|
displayed to a user. This section specifies the steps to perform in
|
|||
|
the conversion, and the ToASCII and ToUnicode operations.
|
|||
|
|
|||
|
The input to ToASCII or ToUnicode is a single label that is a
|
|||
|
sequence of Unicode code points (remember that all ASCII code points
|
|||
|
are also Unicode code points). If a domain name is represented using
|
|||
|
a character set other than Unicode or US-ASCII, it will first need to
|
|||
|
be transcoded to Unicode.
|
|||
|
|
|||
|
Starting from a whole domain name, the steps that an application
|
|||
|
takes to do the conversions are:
|
|||
|
|
|||
|
1) Decide whether the domain name is a "stored string" or a "query
|
|||
|
string" as described in [STRINGPREP]. If this conversion follows
|
|||
|
the "queries" rule from [STRINGPREP], set the flag called
|
|||
|
"AllowUnassigned".
|
|||
|
|
|||
|
2) Split the domain name into individual labels as described in
|
|||
|
section 3.1. The labels do not include the separator.
|
|||
|
|
|||
|
3) For each label, decide whether or not to enforce the restrictions
|
|||
|
on ASCII characters in host names [STD3]. (Applications already
|
|||
|
faced this choice before the introduction of IDNA, and can
|
|||
|
continue to make the decision the same way they always have; IDNA
|
|||
|
makes no new recommendations regarding this choice.) If the
|
|||
|
restrictions are to be enforced, set the flag called
|
|||
|
"UseSTD3ASCIIRules" for that label.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 9]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
4) Process each label with either the ToASCII or the ToUnicode
|
|||
|
operation as appropriate. Typically, you use the ToASCII
|
|||
|
operation if you are about to put the name into an IDN-unaware
|
|||
|
slot, and you use the ToUnicode operation if you are displaying
|
|||
|
the name to a user; section 3.1 gives greater detail on the
|
|||
|
applicable requirements.
|
|||
|
|
|||
|
5) If ToASCII was applied in step 4 and dots are used as label
|
|||
|
separators, change all the label separators to U+002E (full stop).
|
|||
|
|
|||
|
The following two subsections define the ToASCII and ToUnicode
|
|||
|
operations that are used in step 4.
|
|||
|
|
|||
|
This description of the protocol uses specific procedure names, names
|
|||
|
of flags, and so on, in order to facilitate the specification of the
|
|||
|
protocol. These names, as well as the actual steps of the
|
|||
|
procedures, are not required of an implementation. In fact, any
|
|||
|
implementation which has the same external behavior as specified in
|
|||
|
this document conforms to this specification.
|
|||
|
|
|||
|
4.1 ToASCII
|
|||
|
|
|||
|
The ToASCII operation takes a sequence of Unicode code points that
|
|||
|
make up one label and transforms it into a sequence of code points in
|
|||
|
the ASCII range (0..7F). If ToASCII succeeds, the original sequence
|
|||
|
and the resulting sequence are equivalent labels.
|
|||
|
|
|||
|
It is important to note that the ToASCII operation can fail. ToASCII
|
|||
|
fails if any step of it fails. If any step of the ToASCII operation
|
|||
|
fails on any label in a domain name, that domain name MUST NOT be
|
|||
|
used as an internationalized domain name. The method for dealing
|
|||
|
with this failure is application-specific.
|
|||
|
|
|||
|
The inputs to ToASCII are a sequence of code points, the
|
|||
|
AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of
|
|||
|
ToASCII is either a sequence of ASCII code points or a failure
|
|||
|
condition.
|
|||
|
|
|||
|
ToASCII never alters a sequence of code points that are all in the
|
|||
|
ASCII range to begin with (although it could fail). Applying the
|
|||
|
ToASCII operation multiple times has exactly the same effect as
|
|||
|
applying it just once.
|
|||
|
|
|||
|
ToASCII consists of the following steps:
|
|||
|
|
|||
|
1. If the sequence contains any code points outside the ASCII range
|
|||
|
(0..7F) then proceed to step 2, otherwise skip to step 3.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 10]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
2. Perform the steps specified in [NAMEPREP] and fail if there is an
|
|||
|
error. The AllowUnassigned flag is used in [NAMEPREP].
|
|||
|
|
|||
|
3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
|
|||
|
|
|||
|
(a) Verify the absence of non-LDH ASCII code points; that is, the
|
|||
|
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
|
|||
|
|
|||
|
(b) Verify the absence of leading and trailing hyphen-minus; that
|
|||
|
is, the absence of U+002D at the beginning and end of the
|
|||
|
sequence.
|
|||
|
|
|||
|
4. If the sequence contains any code points outside the ASCII range
|
|||
|
(0..7F) then proceed to step 5, otherwise skip to step 8.
|
|||
|
|
|||
|
5. Verify that the sequence does NOT begin with the ACE prefix.
|
|||
|
|
|||
|
6. Encode the sequence using the encoding algorithm in [PUNYCODE] and
|
|||
|
fail if there is an error.
|
|||
|
|
|||
|
7. Prepend the ACE prefix.
|
|||
|
|
|||
|
8. Verify that the number of code points is in the range 1 to 63
|
|||
|
inclusive.
|
|||
|
|
|||
|
4.2 ToUnicode
|
|||
|
|
|||
|
The ToUnicode operation takes a sequence of Unicode code points that
|
|||
|
make up one label and returns a sequence of Unicode code points. If
|
|||
|
the input sequence is a label in ACE form, then the result is an
|
|||
|
equivalent internationalized label that is not in ACE form, otherwise
|
|||
|
the original sequence is returned unaltered.
|
|||
|
|
|||
|
ToUnicode never fails. If any step fails, then the original input
|
|||
|
sequence is returned immediately in that step.
|
|||
|
|
|||
|
The ToUnicode output never contains more code points than its input.
|
|||
|
Note that the number of octets needed to represent a sequence of code
|
|||
|
points depends on the particular character encoding used.
|
|||
|
|
|||
|
The inputs to ToUnicode are a sequence of code points, the
|
|||
|
AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of
|
|||
|
ToUnicode is always a sequence of Unicode code points.
|
|||
|
|
|||
|
1. If all code points in the sequence are in the ASCII range (0..7F)
|
|||
|
then skip to step 3.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 11]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
2. Perform the steps specified in [NAMEPREP] and fail if there is an
|
|||
|
error. (If step 3 of ToASCII is also performed here, it will not
|
|||
|
affect the overall behavior of ToUnicode, but it is not
|
|||
|
necessary.) The AllowUnassigned flag is used in [NAMEPREP].
|
|||
|
|
|||
|
3. Verify that the sequence begins with the ACE prefix, and save a
|
|||
|
copy of the sequence.
|
|||
|
|
|||
|
4. Remove the ACE prefix.
|
|||
|
|
|||
|
5. Decode the sequence using the decoding algorithm in [PUNYCODE] and
|
|||
|
fail if there is an error. Save a copy of the result of this
|
|||
|
step.
|
|||
|
|
|||
|
6. Apply ToASCII.
|
|||
|
|
|||
|
7. Verify that the result of step 6 matches the saved copy from step
|
|||
|
3, using a case-insensitive ASCII comparison.
|
|||
|
|
|||
|
8. Return the saved copy from step 5.
|
|||
|
|
|||
|
5. ACE prefix
|
|||
|
|
|||
|
The ACE prefix, used in the conversion operations (section 4), is two
|
|||
|
alphanumeric ASCII characters followed by two hyphen-minuses. It
|
|||
|
cannot be any of the prefixes already used in earlier documents,
|
|||
|
which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--",
|
|||
|
"ra--", "wq--" and "zq--". The ToASCII and ToUnicode operations MUST
|
|||
|
recognize the ACE prefix in a case-insensitive manner.
|
|||
|
|
|||
|
The ACE prefix for IDNA is "xn--" or any capitalization thereof.
|
|||
|
|
|||
|
This means that an ACE label might be "xn--de-jg4avhby1noc0d", where
|
|||
|
"de-jg4avhby1noc0d" is the part of the ACE label that is generated by
|
|||
|
the encoding steps in [PUNYCODE].
|
|||
|
|
|||
|
While all ACE labels begin with the ACE prefix, not all labels
|
|||
|
beginning with the ACE prefix are necessarily ACE labels. Non-ACE
|
|||
|
labels that begin with the ACE prefix will confuse users and SHOULD
|
|||
|
NOT be allowed in DNS zones.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 12]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
6. Implications for typical applications using DNS
|
|||
|
|
|||
|
In IDNA, applications perform the processing needed to input
|
|||
|
internationalized domain names from users, display internationalized
|
|||
|
domain names to users, and process the inputs and outputs from DNS
|
|||
|
and other protocols that carry domain names.
|
|||
|
|
|||
|
The components and interfaces between them can be represented
|
|||
|
pictorially as:
|
|||
|
|
|||
|
+------+
|
|||
|
| User |
|
|||
|
+------+
|
|||
|
^
|
|||
|
| Input and display: local interface methods
|
|||
|
| (pen, keyboard, glowing phosphorus, ...)
|
|||
|
+-------------------|-------------------------------+
|
|||
|
| v |
|
|||
|
| +-----------------------------+ |
|
|||
|
| | Application | |
|
|||
|
| | (ToASCII and ToUnicode | |
|
|||
|
| | operations may be | |
|
|||
|
| | called here) | |
|
|||
|
| +-----------------------------+ |
|
|||
|
| ^ ^ | End system
|
|||
|
| | | |
|
|||
|
| Call to resolver: | | Application-specific |
|
|||
|
| ACE | | protocol: |
|
|||
|
| v | ACE unless the |
|
|||
|
| +----------+ | protocol is updated |
|
|||
|
| | Resolver | | to handle other |
|
|||
|
| +----------+ | encodings |
|
|||
|
| ^ | |
|
|||
|
+-----------------|----------|----------------------+
|
|||
|
DNS protocol: | |
|
|||
|
ACE | |
|
|||
|
v v
|
|||
|
+-------------+ +---------------------+
|
|||
|
| DNS servers | | Application servers |
|
|||
|
+-------------+ +---------------------+
|
|||
|
|
|||
|
The box labeled "Application" is where the application splits a
|
|||
|
domain name into labels, sets the appropriate flags, and performs the
|
|||
|
ToASCII and ToUnicode operations. This is described in section 4.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 13]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
6.1 Entry and display in applications
|
|||
|
|
|||
|
Applications can accept domain names using any character set or sets
|
|||
|
desired by the application developer, and can display domain names in
|
|||
|
any charset. That is, the IDNA protocol does not affect the
|
|||
|
interface between users and applications.
|
|||
|
|
|||
|
An IDNA-aware application can accept and display internationalized
|
|||
|
domain names in two formats: the internationalized character set(s)
|
|||
|
supported by the application, and as an ACE label. ACE labels that
|
|||
|
are displayed or input MUST always include the ACE prefix.
|
|||
|
Applications MAY allow input and display of ACE labels, but are not
|
|||
|
encouraged to do so except as an interface for special purposes,
|
|||
|
possibly for debugging, or to cope with display limitations as
|
|||
|
described in section 6.4.. ACE encoding is opaque and ugly, and
|
|||
|
should thus only be exposed to users who absolutely need it. Because
|
|||
|
name labels encoded as ACE name labels can be rendered either as the
|
|||
|
encoded ASCII characters or the proper decoded characters, the
|
|||
|
application MAY have an option for the user to select the preferred
|
|||
|
method of display; if it does, rendering the ACE SHOULD NOT be the
|
|||
|
default.
|
|||
|
|
|||
|
Domain names are often stored and transported in many places. For
|
|||
|
example, they are part of documents such as mail messages and web
|
|||
|
pages. They are transported in many parts of many protocols, such as
|
|||
|
both the control commands and the RFC 2822 body parts of SMTP, and
|
|||
|
the headers and the body content in HTTP. It is important to
|
|||
|
remember that domain names appear both in domain name slots and in
|
|||
|
the content that is passed over protocols.
|
|||
|
|
|||
|
In protocols and document formats that define how to handle
|
|||
|
specification or negotiation of charsets, labels can be encoded in
|
|||
|
any charset allowed by the protocol or document format. If a
|
|||
|
protocol or document format only allows one charset, the labels MUST
|
|||
|
be given in that charset.
|
|||
|
|
|||
|
In any place where a protocol or document format allows transmission
|
|||
|
of the characters in internationalized labels, internationalized
|
|||
|
labels SHOULD be transmitted using whatever character encoding and
|
|||
|
escape mechanism that the protocol or document format uses at that
|
|||
|
place.
|
|||
|
|
|||
|
All protocols that use domain name slots already have the capacity
|
|||
|
for handling domain names in the ASCII charset. Thus, ACE labels
|
|||
|
(internationalized labels that have been processed with the ToASCII
|
|||
|
operation) can inherently be handled by those protocols.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 14]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
6.2 Applications and resolver libraries
|
|||
|
|
|||
|
Applications normally use functions in the operating system when they
|
|||
|
resolve DNS queries. Those functions in the operating system are
|
|||
|
often called "the resolver library", and the applications communicate
|
|||
|
with the resolver libraries through a programming interface (API).
|
|||
|
|
|||
|
Because these resolver libraries today expect only domain names in
|
|||
|
ASCII, applications MUST prepare labels that are passed to the
|
|||
|
resolver library using the ToASCII operation. Labels received from
|
|||
|
the resolver library contain only ASCII characters; internationalized
|
|||
|
labels that cannot be represented directly in ASCII use the ACE form.
|
|||
|
ACE labels always include the ACE prefix.
|
|||
|
|
|||
|
An operating system might have a set of libraries for performing the
|
|||
|
ToASCII operation. The input to such a library might be in one or
|
|||
|
more charsets that are used in applications (UTF-8 and UTF-16 are
|
|||
|
likely candidates for almost any operating system, and script-
|
|||
|
specific charsets are likely for localized operating systems).
|
|||
|
|
|||
|
IDNA-aware applications MUST be able to work with both non-
|
|||
|
internationalized labels (those that conform to [STD13] and [STD3])
|
|||
|
and internationalized labels.
|
|||
|
|
|||
|
It is expected that new versions of the resolver libraries in the
|
|||
|
future will be able to accept domain names in other charsets than
|
|||
|
ASCII, and application developers might one day pass not only domain
|
|||
|
names in Unicode, but also in local script to a new API for the
|
|||
|
resolver libraries in the operating system. Thus the ToASCII and
|
|||
|
ToUnicode operations might be performed inside these new versions of
|
|||
|
the resolver libraries.
|
|||
|
|
|||
|
Domain names passed to resolvers or put into the question section of
|
|||
|
DNS requests follow the rules for "queries" from [STRINGPREP].
|
|||
|
|
|||
|
6.3 DNS servers
|
|||
|
|
|||
|
Domain names stored in zones follow the rules for "stored strings"
|
|||
|
from [STRINGPREP].
|
|||
|
|
|||
|
For internationalized labels that cannot be represented directly in
|
|||
|
ASCII, DNS servers MUST use the ACE form produced by the ToASCII
|
|||
|
operation. All IDNs served by DNS servers MUST contain only ASCII
|
|||
|
characters.
|
|||
|
|
|||
|
If a signaling system which makes negotiation possible between old
|
|||
|
and new DNS clients and servers is standardized in the future, the
|
|||
|
encoding of the query in the DNS protocol itself can be changed from
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 15]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
ACE to something else, such as UTF-8. The question whether or not
|
|||
|
this should be used is, however, a separate problem and is not
|
|||
|
discussed in this memo.
|
|||
|
|
|||
|
6.4 Avoiding exposing users to the raw ACE encoding
|
|||
|
|
|||
|
Any application that might show the user a domain name obtained from
|
|||
|
a domain name slot, such as from gethostbyaddr or part of a mail
|
|||
|
header, will need to be updated if it is to prevent users from seeing
|
|||
|
the ACE.
|
|||
|
|
|||
|
If an application decodes an ACE name using ToUnicode but cannot show
|
|||
|
all of the characters in the decoded name, such as if the name
|
|||
|
contains characters that the output system cannot display, the
|
|||
|
application SHOULD show the name in ACE format (which always includes
|
|||
|
the ACE prefix) instead of displaying the name with the replacement
|
|||
|
character (U+FFFD). This is to make it easier for the user to
|
|||
|
transfer the name correctly to other programs. Programs that by
|
|||
|
default show the ACE form when they cannot show all the characters in
|
|||
|
a name label SHOULD also have a mechanism to show the name that is
|
|||
|
produced by the ToUnicode operation with as many characters as
|
|||
|
possible and replacement characters in the positions where characters
|
|||
|
cannot be displayed.
|
|||
|
|
|||
|
The ToUnicode operation does not alter labels that are not valid ACE
|
|||
|
labels, even if they begin with the ACE prefix. After ToUnicode has
|
|||
|
been applied, if a label still begins with the ACE prefix, then it is
|
|||
|
not a valid ACE label, and is not equivalent to any of the
|
|||
|
intermediate Unicode strings constructed by ToUnicode.
|
|||
|
|
|||
|
6.5 DNSSEC authentication of IDN domain names
|
|||
|
|
|||
|
DNS Security [RFC2535] is a method for supplying cryptographic
|
|||
|
verification information along with DNS messages. Public Key
|
|||
|
Cryptography is used in conjunction with digital signatures to
|
|||
|
provide a means for a requester of domain information to authenticate
|
|||
|
the source of the data. This ensures that it can be traced back to a
|
|||
|
trusted source, either directly, or via a chain of trust linking the
|
|||
|
source of the information to the top of the DNS hierarchy.
|
|||
|
|
|||
|
IDNA specifies that all internationalized domain names served by DNS
|
|||
|
servers that cannot be represented directly in ASCII must use the ACE
|
|||
|
form produced by the ToASCII operation. This operation must be
|
|||
|
performed prior to a zone being signed by the private key for that
|
|||
|
zone. Because of this ordering, it is important to recognize that
|
|||
|
DNSSEC authenticates the ASCII domain name, not the Unicode form or
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 16]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
the mapping between the Unicode form and the ASCII form. In the
|
|||
|
presence of DNSSEC, this is the name that MUST be signed in the zone
|
|||
|
and MUST be validated against.
|
|||
|
|
|||
|
One consequence of this for sites deploying IDNA in the presence of
|
|||
|
DNSSEC is that any special purpose proxies or forwarders used to
|
|||
|
transform user input into IDNs must be earlier in the resolution flow
|
|||
|
than DNSSEC authenticating nameservers for DNSSEC to work.
|
|||
|
|
|||
|
7. Name server considerations
|
|||
|
|
|||
|
Existing DNS servers do not know the IDNA rules for handling non-
|
|||
|
ASCII forms of IDNs, and therefore need to be shielded from them.
|
|||
|
All existing channels through which names can enter a DNS server
|
|||
|
database (for example, master files [STD13] and DNS update messages
|
|||
|
[RFC2136]) are IDN-unaware because they predate IDNA, and therefore
|
|||
|
requirement 2 of section 3.1 of this document provides the needed
|
|||
|
shielding, by ensuring that internationalized domain names entering
|
|||
|
DNS server databases through such channels have already been
|
|||
|
converted to their equivalent ASCII forms.
|
|||
|
|
|||
|
It is imperative that there be only one ASCII encoding for a
|
|||
|
particular domain name. Because of the design of the ToASCII and
|
|||
|
ToUnicode operations, there are no ACE labels that decode to ASCII
|
|||
|
labels, and therefore name servers cannot contain multiple ASCII
|
|||
|
encodings of the same domain name.
|
|||
|
|
|||
|
[RFC2181] explicitly allows domain labels to contain octets beyond
|
|||
|
the ASCII range (0..7F), and this document does not change that.
|
|||
|
Note, however, that there is no defined interpretation of octets
|
|||
|
80..FF as characters. If labels containing these octets are returned
|
|||
|
to applications, unpredictable behavior could result. The ASCII form
|
|||
|
defined by ToASCII is the only standard representation for
|
|||
|
internationalized labels in the current DNS protocol.
|
|||
|
|
|||
|
8. Root server considerations
|
|||
|
|
|||
|
IDNs are likely to be somewhat longer than current domain names, so
|
|||
|
the bandwidth needed by the root servers is likely to go up by a
|
|||
|
small amount. Also, queries and responses for IDNs will probably be
|
|||
|
somewhat longer than typical queries today, so more queries and
|
|||
|
responses may be forced to go to TCP instead of UDP.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 17]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
9. References
|
|||
|
|
|||
|
9.1 Normative References
|
|||
|
|
|||
|
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
|
|||
|
Requirement Levels", BCP 14, RFC 2119, March 1997.
|
|||
|
|
|||
|
[STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
|
|||
|
Internationalized Strings ("stringprep")", RFC 3454,
|
|||
|
December 2002.
|
|||
|
|
|||
|
[NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
|
|||
|
Profile for Internationalized Domain Names (IDN)", RFC
|
|||
|
3491, March 2003.
|
|||
|
|
|||
|
[PUNYCODE] Costello, A., "Punycode: A Bootstring encoding of
|
|||
|
Unicode for use with Internationalized Domain Names in
|
|||
|
Applications (IDNA)", RFC 3492, March 2003.
|
|||
|
|
|||
|
[STD3] Braden, R., "Requirements for Internet Hosts --
|
|||
|
Communication Layers", STD 3, RFC 1122, and
|
|||
|
"Requirements for Internet Hosts -- Application and
|
|||
|
Support", STD 3, RFC 1123, October 1989.
|
|||
|
|
|||
|
[STD13] Mockapetris, P., "Domain names - concepts and
|
|||
|
facilities", STD 13, RFC 1034 and "Domain names -
|
|||
|
implementation and specification", STD 13, RFC 1035,
|
|||
|
November 1987.
|
|||
|
|
|||
|
9.2 Informative References
|
|||
|
|
|||
|
[RFC2535] Eastlake, D., "Domain Name System Security Extensions",
|
|||
|
RFC 2535, March 1999.
|
|||
|
|
|||
|
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
|
|||
|
Specification", RFC 2181, July 1997.
|
|||
|
|
|||
|
[UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm,
|
|||
|
<http://www.unicode.org/unicode/reports/tr9/>.
|
|||
|
|
|||
|
[UNICODE] The Unicode Consortium. The Unicode Standard, Version
|
|||
|
3.2.0 is defined by The Unicode Standard, Version 3.0
|
|||
|
(Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
|
|||
|
as amended by the Unicode Standard Annex #27: Unicode
|
|||
|
3.1 (http://www.unicode.org/reports/tr27/) and by the
|
|||
|
Unicode Standard Annex #28: Unicode 3.2
|
|||
|
(http://www.unicode.org/reports/tr28/).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 18]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
[USASCII] Cerf, V., "ASCII format for Network Interchange", RFC
|
|||
|
20, October 1969.
|
|||
|
|
|||
|
10. Security Considerations
|
|||
|
|
|||
|
Security on the Internet partly relies on the DNS. Thus, any change
|
|||
|
to the characteristics of the DNS can change the security of much of
|
|||
|
the Internet.
|
|||
|
|
|||
|
This memo describes an algorithm which encodes characters that are
|
|||
|
not valid according to STD3 and STD13 into octet values that are
|
|||
|
valid. No security issues such as string length increases or new
|
|||
|
allowed values are introduced by the encoding process or the use of
|
|||
|
these encoded values, apart from those introduced by the ACE encoding
|
|||
|
itself.
|
|||
|
|
|||
|
Domain names are used by users to identify and connect to Internet
|
|||
|
servers. The security of the Internet is compromised if a user
|
|||
|
entering a single internationalized name is connected to different
|
|||
|
servers based on different interpretations of the internationalized
|
|||
|
domain name.
|
|||
|
|
|||
|
When systems use local character sets other than ASCII and Unicode,
|
|||
|
this specification leaves the the problem of transcoding between the
|
|||
|
local character set and Unicode up to the application. If different
|
|||
|
applications (or different versions of one application) implement
|
|||
|
different transcoding rules, they could interpret the same name
|
|||
|
differently and contact different servers. This problem is not
|
|||
|
solved by security protocols like TLS that do not take local
|
|||
|
character sets into account.
|
|||
|
|
|||
|
Because this document normatively refers to [NAMEPREP], [PUNYCODE],
|
|||
|
and [STRINGPREP], it includes the security considerations from those
|
|||
|
documents as well.
|
|||
|
|
|||
|
If or when this specification is updated to use a more recent Unicode
|
|||
|
normalization table, the new normalization table will need to be
|
|||
|
compared with the old to spot backwards incompatible changes. If
|
|||
|
there are such changes, they will need to be handled somehow, or
|
|||
|
there will be security as well as operational implications. Methods
|
|||
|
to handle the conflicts could include keeping the old normalization,
|
|||
|
or taking care of the conflicting characters by operational means, or
|
|||
|
some other method.
|
|||
|
|
|||
|
Implementations MUST NOT use more recent normalization tables than
|
|||
|
the one referenced from this document, even though more recent tables
|
|||
|
may be provided by operating systems. If an application is unsure of
|
|||
|
which version of the normalization tables are in the operating
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 19]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
system, the application needs to include the normalization tables
|
|||
|
itself. Using normalization tables other than the one referenced
|
|||
|
from this specification could have security and operational
|
|||
|
implications.
|
|||
|
|
|||
|
To help prevent confusion between characters that are visually
|
|||
|
similar, it is suggested that implementations provide visual
|
|||
|
indications where a domain name contains multiple scripts. Such
|
|||
|
mechanisms can also be used to show when a name contains a mixture of
|
|||
|
simplified and traditional Chinese characters, or to distinguish zero
|
|||
|
and one from O and l. DNS zone adminstrators may impose restrictions
|
|||
|
(subject to the limitations in section 2) that try to minimize
|
|||
|
homographs.
|
|||
|
|
|||
|
Domain names (or portions of them) are sometimes compared against a
|
|||
|
set of privileged or anti-privileged domains. In such situations it
|
|||
|
is especially important that the comparisons be done properly, as
|
|||
|
specified in section 3.1 requirement 4. For labels already in ASCII
|
|||
|
form, the proper comparison reduces to the same case-insensitive
|
|||
|
ASCII comparison that has always been used for ASCII labels.
|
|||
|
|
|||
|
The introduction of IDNA means that any existing labels that start
|
|||
|
with the ACE prefix and would be altered by ToUnicode will
|
|||
|
automatically be ACE labels, and will be considered equivalent to
|
|||
|
non-ASCII labels, whether or not that was the intent of the zone
|
|||
|
adminstrator or registrant.
|
|||
|
|
|||
|
11. IANA Considerations
|
|||
|
|
|||
|
IANA has assigned the ACE prefix in consultation with the IESG.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 20]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
12. Authors' Addresses
|
|||
|
|
|||
|
Patrik Faltstrom
|
|||
|
Cisco Systems
|
|||
|
Arstaangsvagen 31 J
|
|||
|
S-117 43 Stockholm Sweden
|
|||
|
|
|||
|
EMail: paf@cisco.com
|
|||
|
|
|||
|
|
|||
|
Paul Hoffman
|
|||
|
Internet Mail Consortium and VPN Consortium
|
|||
|
127 Segre Place
|
|||
|
Santa Cruz, CA 95060 USA
|
|||
|
|
|||
|
EMail: phoffman@imc.org
|
|||
|
|
|||
|
|
|||
|
Adam M. Costello
|
|||
|
University of California, Berkeley
|
|||
|
|
|||
|
URL: http://www.nicemice.net/amc/
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 21]
|
|||
|
|
|||
|
RFC 3490 IDNA March 2003
|
|||
|
|
|||
|
|
|||
|
13. Full Copyright Statement
|
|||
|
|
|||
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|||
|
This document and translations of it may be copied and furnished to
|
|||
|
others, and derivative works that comment on or otherwise explain it
|
|||
|
or assist in its implementation may be prepared, copied, published
|
|||
|
and distributed, in whole or in part, without restriction of any
|
|||
|
kind, provided that the above copyright notice and this paragraph are
|
|||
|
included on all such copies and derivative works. However, this
|
|||
|
document itself may not be modified in any way, such as by removing
|
|||
|
the copyright notice or references to the Internet Society or other
|
|||
|
Internet organizations, except as needed for the purpose of
|
|||
|
developing Internet standards in which case the procedures for
|
|||
|
copyrights defined in the Internet Standards process must be
|
|||
|
followed, or as required to translate it into languages other than
|
|||
|
English.
|
|||
|
|
|||
|
The limited permissions granted above are perpetual and will not be
|
|||
|
revoked by the Internet Society or its successors or assigns.
|
|||
|
|
|||
|
This document and the information contained herein is provided on an
|
|||
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
|||
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
|||
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
|||
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
|||
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|||
|
|
|||
|
Acknowledgement
|
|||
|
|
|||
|
Funding for the RFC Editor function is currently provided by the
|
|||
|
Internet Society.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Faltstrom, et al. Standards Track [Page 22]
|
|||
|
|