Re: [Nmh-workers] bug in decode

nmh-workers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] bug in decode_rfc2047()

From:	Ken Hornstein
Subject:	Re: [Nmh-workers] bug in decode_rfc2047()
Date:	Thu, 03 Jan 2013 13:31:45 -0500

>The root of all this is iconv's behavior that requires us to
>skip past the invalid character.  Looking at it now, I wonder if
>we can do better than the current special handling for UTF-8?
>It's the "fromutf8" block below:
>[...]

Hm.  I played around with this a bit, and I'm not sure what to do.

iconv() doesn't distinguish between "We can't convert this character to
the target character set" and "This multibyte sequence is invalid"; they
both get EILSEQ.  Even worse, we can't (portably) tell where the end of
a multibyte sequence is.

So, I see a couple of options.  We could go completely portable and put
in a "?" (or whatever) for every byte that's invalid.  That would have
us generate multiple "?" for multibyte character sets like UTF8.  We could
suppress multiple invalid bytes in a row so there's just one "?", but
that seems kinda lousy to me.

GNU libiconv (which is seems like a fair number of people use) has
an iconvctl() function and it has an undocumented function that lets
you create your own substitution function for invalid bytes/codepoints.
That function isn't part of POSIX.  The fact that it's undocumented and
nonstandard makes me think we shouldn't use it.

Unless we have a LOT of multibyte character sets to deal with, perhaps
the special-case here for UTF8 is the best alternative?  Any other thoughts
on this matter?

--Ken

[Prev in Thread]

Current Thread

[Next in Thread]

[Nmh-workers] bug in decode_rfc2047(), David Levine, 2013/01/02
- Re: [Nmh-workers] bug in decode_rfc2047(), Ken Hornstein, 2013/01/03
- Re: [Nmh-workers] bug in decode_rfc2047(), David Levine, 2013/01/03
  - Re: [Nmh-workers] bug in decode_rfc2047(), Ken Hornstein <=
    - Re: [Nmh-workers] bug in decode_rfc2047(), Valdis . Kletnieks, 2013/01/03
- Re: [Nmh-workers] bug in decode_rfc2047(), David Levine, 2013/01/03
  - Re: [Nmh-workers] bug in decode_rfc2047(), Ken Hornstein, 2013/01/04

Prev by Date: Re: [Nmh-workers] Garbage collection
Next by Date: Re: [Nmh-workers] bug in decode_rfc2047()
Previous by thread: Re: [Nmh-workers] bug in decode_rfc2047()
Next by thread: Re: [Nmh-workers] bug in decode_rfc2047()
Index(es):
- Date
- Thread