nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pi


From: Ken Hornstein
Subject: Re: [nmh-workers] nmh 1.7.1: both bcc and dcc broken for mts sendmail/pipe
Date: Thu, 14 Feb 2019 21:08:41 -0500

>The «"» around `Blind-Carbon-Copy' should be \(lq and \(rq, or the
>equivalent strings for consistency with the style used at start of the
>paragraph.

So, in a mostly unrelated note ... I couldn't help noticing that Ralph
used guillemets («») in one of his messages on this thread (way to push
non-US-ASCII characters, Ralph!), and after a series of replies to his note
things devolved into classic mojibake.  And since hopefully most everyone
on this thread is an nmh user, I wanted to understand why, because really
that shouldn't have happened.

I went back to the raw archives (ftp://lists.gnu.org/nmh-workers/2019-02)
because the mailing list software will sometimes translate stuff into
base64 encoding when it sees non-ASCII characters.  And, well, I hate to
assign blame, but I think it's a bit unavoidable ... please, don't anyone
take this as a personal attack, I am just trying to understand how we
could do better.

Ralph's original note containing the guillemets (Message-Id
<address@hidden>) was text/plain, a
character set of utf-8, and encoded using quoted-printable.  The
characters were encoded properly using quoted-printable, specifically
they were listed as =C2=AB and =C2=BB.

Valdis was the first reply to that (Message-ID
<address@hidden>), and HIS email was text/plain,
character set iso-8859-1, and encoded using quoted-printable.  He quoted
Ralph's message, and the guillemets were encoded as =AB and =BB.  Which seems
correct to me.

Paul Fox replied to Valdis's note (Message-Id
<address@hidden>), and THAT note
was text/plain, character set UTF-8, encoded using quoted-printable ...
but it seems like this was the start of where things went off the rails.
The original line in Valdis's email was (in raw form):

   > The =AB=22=BB around ...

But in Paul's note it ended up as (extra > added in the reply)

   > > The  =AB" =BB around 

This is NOT correct.  First, there is an extra space in front of
the encoded bytes.  Secondly, they're not valid UTF-8; they're the
ISO-8859-1 bytes.  So I am guessing whatever Paul used to quote the reply
didn't translate the ISO-8859-1 characters properly into UTF-8.

However, whatever Mark Bergman uses for email actually made an intelligent
decision.  When he replied to Paul's note, those invalid UTF-8 characters
got converted to the Unicode Replacement Character (U+FFFD), which was
sent out as =EF=BF=BD (utf-8, quoted-printable).

Further muddying the waters ... when Ralph replied to Mark's email,
those Unicode Replacement Characters somehow got converted back to
the correct guillemets (=C2=AB and =C2=BB).  Which means Ralph has
perhaps the most intelligent reply quoting program ever and he should
immediately share it as it would revolutionize AI, or he went back and
manually corrected it when he replied to Mark's note.  I'm 50/50 on
which one of those scenarios is more likely.

If anyone involved with this email thread wants to pipe up with some
more explanation on what exactly they used to compose their email
replies, I would love to hear it.  No judgements; I just want to know
how nmh could help everyone do better.  Like, do we need to include
better tools for composing reply messages?  Well, duh, the answer to
that is "yes", and I think replyfilter does ok here but obviously we
need to do better.  But if we're SENDING something that is not valid
UTF-8, should we be smarter and flag it?  People were upset when we
refused to send out 8-bit characters when your locale was US-ASCII (I
mean, REALLY?  I couldn't believe it), so I don't know what makes sense.
Sending out invalid UTF-8 just seems wrong to me.

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]