[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AW: treatment of U+002E that is produced by NFKC
From: |
Erik van der Poel |
Subject: |
Re: AW: treatment of U+002E that is produced by NFKC |
Date: |
Tue, 15 Jan 2008 07:42:27 -0800 |
Looks good to me.
Other than your interpretation of RFC 3490 leading to the insertion of
0x2E into a DNS label, but I guess you and I will simply have to agree
that we disagree on this point. RFC 3490 should have been clearer. By
the way, I did a Web search for "2024 nfkc" and found that this issue
was raised, but I guess it was not resolved adequately:
http://www.ops.ietf.org/lists/idn/idn.2001/msg02450.html
Erik
On Jan 15, 2008 7:15 AM, Simon Josefsson <address@hidden> wrote:
> "Erik van der Poel" <address@hidden> writes:
>
> > Yes, that's right.
> >
> > By the way, there may be a different way to address this issue. If
> > libidn has a separate API for NFKC or Nameprep, the caller could pass
> > the entire domain name (including all of the dots and dot-like
> > characters) through NFKC (or Nameprep) first, and then call the normal
> > IDNA routine. This is quite likely to behave the same way as MSIE 7
> > and Firefox 2. If you chose this approach, you could simply document
> > this somewhere, and callers could then decide whether or not to go
> > this way.
>
> Libidn has a simple NFKC interface, and I'm documenting that approach
> now. Below is the current text in the manual. I'll forward this to the
> Firefox IDN guys to see if they are interested in documenting their
> practice further, possibly in an I-D. If ToASCII(NFKC(i)) turns out to
> actually work and behave better than RFC 3490, documenting that now
> seems useful.
>
> Thanks,
> /Simon
>
> Appendix B On Label Separators
> ******************************
>
> Some strings contains characters whose NFKC normalized form contain the
> ASCII dot (0x2E, "."). Examples of these characters are U+2024 (ONE
> DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the
> interesting property that their IDNA ToASCII output will contain
> embedded dots. For example:
>
> ToASCII (hi U+248C com) = hi5.com
> ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
>
> This demonstrate the two general cases: The first where the ASCII dot
> is part of an output that do not begin with the IDN prefix "xn-". The
> second example illustrate when the dot is part of IDN prefixed with
> "xn-".
>
> The input strings are, from the DNS point of view, a single label.
> The IDNA algorithm translate one label at a time. Thus, the output is
> expected to be only one label. What is important here is to make sure
> the DNS resolver receives the correct query. The DNS protocol does not
> use the dot to delimit labels on the wire, rather it uses length-value
> pairs. Thus the correct query would be for `{7}hi5.com' and
> `{22}xn--rksmrgs.com-l8as9u' respectively.
>
> Some implementations (1) have decided that these inputs strings are
> potentially confusing for the user. The string "hi U+248C com" looks
> like "hi5.com" on systems that support Unicode properly. These
> implementations do not follow RFC 3490. They yield:
>
> ToASCII (hi U+248C com) = hi5.com
> ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
>
> The DNS query they perform are `{3}hi5{3}com' and
> `{18}xn--rksmrgs-5wao1o{3}com' respectively. Arguably, this leads to a
> better user experience, and suggests that the IDNA specification is
> sub-optimal in this area.
>
> B.1 Recommended Workaround
> ==========================
>
> It has been suggested to normalize the entire input string using NFKC
> before passing it to IDNA ToASCII. You may use
> `stringprep_utf8_nfkc_normalize' or `stringprep_ucs4_nfkc_normalize'.
> This will avoid the problem, and appears to lead to similar behaviour
> as IE/Firefox.
>
> Alternative workarounds are being considered. Eventually Libidn may
> implement a new flag to the `idna_*' functions that implements a
> recommended way to work around this problem.
>
> ---------- Footnotes ----------
>
> (1) Notably Microsoft's Internet Explorer and Mozilla's Firefox, but
> not Apple's Safari.
>
- Re: treatment of U+002E that is produced by NFKC, (continued)
- Re: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/13
- AW: treatment of U+002E that is produced by NFKC, Alexander Gnauck, 2008/01/13
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC,
Erik van der Poel <=
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/14
Re: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/13