bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] MS-ANSI query


From: Peter Flynn
Subject: [bug-gnu-libiconv] MS-ANSI query
Date: Mon, 27 Apr 2009 13:49:38 +0100
User-agent: Thunderbird 2.0.0.21 (X11/20090318)

I'm writing an RSS feed from a LISTSERV list fed by internal (Exchange) mail. The messages could have any content transfer encoding (most are Base64 or quoted-printable, and are handled by mewdecode) and any charset, but in all cases the resulting text is passed through iconv to make it UTF-8.

This doesn't work for messages sent in the undistinguished "8BIT" which appears from examination to be Windows ANSI in all local cases so far (so I'm prepared to live with that assumption).

The management of the feed is done in a Bash shell script and gawk under RHEL5 and the resulting UTF-8 XML is passed to Cocoon for feed generation. In the gawk script, it passes the text through the pipeline

<stuff> | iconv -f <charset> -t utf8 >more stuff

For the 8BIT messages I have tried MS-ANSI but this fails: an example error message is

iconv: illegal input sequence at position 1

for an input stream where the second byte (pos 1) is 0xF3 (lowercase letter o with acute accent). Testing the other ANSI values for the -f parameter, I find:

ANSI_X3.4       same error message
ANSI_X3.4-1968  same error message
ANSI_X3.4-1986  same error message
ANSI_X3.110     no error but converts to UTF8 0xC3 0xB0 (lowercase eth)
ANSI_X3.110-1983 (same as ANSI_X3.110)

I'm not sufficiently familiar with the internals of ANSI 8-bit encodings to know if this is correct (and I therefore have something else undefined) or if it's a bug.

///Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]