[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Re: Tests passing!
From: |
Nathaniel Smith |
Subject: |
Re: [Monotone-devel] Re: Tests passing! |
Date: |
Fri, 2 Mar 2007 01:53:09 -0800 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Fri, Mar 02, 2007 at 12:08:03AM +0100, Lapo Luchini wrote:
> Nathaniel Smith wrote:
> > On Thu, Mar 01, 2007 at 12:12:32PM +0100, Lapo Luchini wrote:
> >> [AC_CACHE_CHECK([if iconv supports //IGNORE//TRANSLIT],
> > Seriously, there is no such thing as //IGNORE//TRANSLIT. Anywhere.
>
> What do you mean "the is no such thing"? 0_o
> At least in GNU iconv there *sure* is!
>
[code snipped snipped]
>
> to me, this accepts "*//IGNORE//TRANSLIT" and does just as if "*" was
> selected, but with flags "transliterate" and "discard_ilseq" activated.
Hmm, you're quite right. I hadn't realized that GNU libiconv and GNU
libc iconv were two totally different codebases -- I read the glibc
code earlier when Ulrich made his comments, and his comments are
accurate for that code.
So my current understanding:
libiconv:
//IGNORE,TRANSLIT or //TRANSLIT,IGNORE are like specifying nothing
at all, you get no transliteration.
//IGNORE//TRANSLIT and //TRANSLIT//IGNORE are identical, and both do
what you'd want. AFAICT, this means transliterating when possible,
and ignoring (not inserting question marks!) otherwise.
old glibc:
//IGNORE,TRANSLIT or //TRANSLIT,IGNORE are like specifying nothing
at all, you get no transliteration.
//IGNORE//TRANSLIT is treated like //IGNORE, and //TRANSLIT//IGNORE
is treated like //TRANSLIT
modern glibc:
//IGNORE,TRANSLIT or //TRANSLIT,IGNORE do what you'd want (AFAICT,
not as sure on this one). I think they're both identical to
//TRANSLIT, in practice (which does a fallback "transliterate
everything to question mark" thing).
//IGNORE//TRANSLIT is treated like //IGNORE, and //TRANSLIT//IGNORE
is treated like //TRANSLIT
everywhere else:
probably none of this stuff works at all, and conceivably even
sticking //foo on the end of your charset will cause one of those
"unrecognized charset" errors?
I guess we always want question marks, we're going to have to insert
them by hand in at least some cases, and we'd rather not deal with all
of this insanity. So maybe we should consider:
-- try opening our iconv handle with //TRANSLIT
-- if that fails, try opening it the normal way (to account for any
systems that just don't know //TRANSLIT)
-- when we actually process bytes using this iconv handle, do
poor-man's-TRANSLIT handling -- whenever iconv says EILSEQ or
EINVAL, then dump a question mark to output, advance the input,
and try again.
This seems like it might be the minimal necessary-and-sufficient code?
-- Nathaniel
--
Eternity is very long, especially towards the end.
-- Woody Allen