[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AW: treatment of U+002E that is produced by NFKC
From: |
Simon Josefsson |
Subject: |
Re: AW: treatment of U+002E that is produced by NFKC |
Date: |
Tue, 15 Jan 2008 16:51:16 +0100 |
User-agent: |
Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux) |
"Erik van der Poel" <address@hidden> writes:
> Looks good to me.
>
> Other than your interpretation of RFC 3490 leading to the insertion of
> 0x2E into a DNS label, but I guess you and I will simply have to agree
> that we disagree on this point. RFC 3490 should have been clearer.
I regard escaping 0x2E as the logical consequence of the IDNA design to
operate on single labels and how U+2024 etc behaves under NFKC. I think
RFC 3490 didn't intend for ToASCII to be able to take one label and
output two labels. I suspect the reason for the problems here is that
there was a perception that ToASCII would never produce new 0x2E's. But
I can't say for sure.
> By the way, I did a Web search for "2024 nfkc" and found that this
> issue was raised, but I guess it was not resolved adequately:
>
> http://www.ops.ietf.org/lists/idn/idn.2001/msg02450.html
Interesting.
/Simon
> Erik
>
> On Jan 15, 2008 7:15 AM, Simon Josefsson <address@hidden> wrote:
>> "Erik van der Poel" <address@hidden> writes:
>>
>> > Yes, that's right.
>> >
>> > By the way, there may be a different way to address this issue. If
>> > libidn has a separate API for NFKC or Nameprep, the caller could pass
>> > the entire domain name (including all of the dots and dot-like
>> > characters) through NFKC (or Nameprep) first, and then call the normal
>> > IDNA routine. This is quite likely to behave the same way as MSIE 7
>> > and Firefox 2. If you chose this approach, you could simply document
>> > this somewhere, and callers could then decide whether or not to go
>> > this way.
>>
>> Libidn has a simple NFKC interface, and I'm documenting that approach
>> now. Below is the current text in the manual. I'll forward this to the
>> Firefox IDN guys to see if they are interested in documenting their
>> practice further, possibly in an I-D. If ToASCII(NFKC(i)) turns out to
>> actually work and behave better than RFC 3490, documenting that now
>> seems useful.
>>
>> Thanks,
>> /Simon
>>
>> Appendix B On Label Separators
>> ******************************
>>
>> Some strings contains characters whose NFKC normalized form contain the
>> ASCII dot (0x2E, "."). Examples of these characters are U+2024 (ONE
>> DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the
>> interesting property that their IDNA ToASCII output will contain
>> embedded dots. For example:
>>
>> ToASCII (hi U+248C com) = hi5.com
>> ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
>>
>> This demonstrate the two general cases: The first where the ASCII dot
>> is part of an output that do not begin with the IDN prefix "xn-". The
>> second example illustrate when the dot is part of IDN prefixed with
>> "xn-".
>>
>> The input strings are, from the DNS point of view, a single label.
>> The IDNA algorithm translate one label at a time. Thus, the output is
>> expected to be only one label. What is important here is to make sure
>> the DNS resolver receives the correct query. The DNS protocol does not
>> use the dot to delimit labels on the wire, rather it uses length-value
>> pairs. Thus the correct query would be for `{7}hi5.com' and
>> `{22}xn--rksmrgs.com-l8as9u' respectively.
>>
>> Some implementations (1) have decided that these inputs strings are
>> potentially confusing for the user. The string "hi U+248C com" looks
>> like "hi5.com" on systems that support Unicode properly. These
>> implementations do not follow RFC 3490. They yield:
>>
>> ToASCII (hi U+248C com) = hi5.com
>> ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
>>
>> The DNS query they perform are `{3}hi5{3}com' and
>> `{18}xn--rksmrgs-5wao1o{3}com' respectively. Arguably, this leads to a
>> better user experience, and suggests that the IDNA specification is
>> sub-optimal in this area.
>>
>> B.1 Recommended Workaround
>> ==========================
>>
>> It has been suggested to normalize the entire input string using NFKC
>> before passing it to IDNA ToASCII. You may use
>> `stringprep_utf8_nfkc_normalize' or `stringprep_ucs4_nfkc_normalize'.
>> This will avoid the problem, and appears to lead to similar behaviour
>> as IE/Firefox.
>>
>> Alternative workarounds are being considered. Eventually Libidn may
>> implement a new flag to the `idna_*' functions that implements a
>> recommended way to work around this problem.
>>
>> ---------- Footnotes ----------
>>
>> (1) Notably Microsoft's Internet Explorer and Mozilla's Firefox, but
>> not Apple's Safari.
>>
- AW: treatment of U+002E that is produced by NFKC, (continued)
- AW: treatment of U+002E that is produced by NFKC, Alexander Gnauck, 2008/01/13
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC,
Simon Josefsson <=
- Re: AW: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/15
- Re: AW: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/14
- Re: treatment of U+002E that is produced by NFKC, Erik van der Poel, 2008/01/14
Re: treatment of U+002E that is produced by NFKC, Simon Josefsson, 2008/01/13