[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunder
From: |
Andy Moreton |
Subject: |
bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird |
Date: |
Wed, 01 May 2019 01:35:09 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (windows-nt) |
On Tue 30 Apr 2019, Paul Eggert wrote:
> The attachment has a text/* media type but it has no charset parameter.
> The patch itself (output by git format-patch) says its charset is UTF-8.
> Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
> mishandles the non-ASCII characters in the attachment. To reproduce the
> problem, read this email with Gnus; the full attachment is attached to
> this email in the Thunderbird way.
>
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.
>
> Unfortunately Gnus apparently doesn't default to UTF-8 for such
> attachments, which means that sending a text/x-patch attachment from
> Thunderbird to Gnus messes up if the attachment contains non-ASCII
> characters. This has been causing problems on the Emacs mailing list for
> years and it bit a correspondent of mine again today; see
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.
>
> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.
After a bit of experimenting, this minimal patch appears to fix things.
Should this also allow the user to choose the charset if none is
specified, or just hardwire it to utf-8 ?
diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
index 3f255419e7..a99d52a7e7 100644
--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -665,6 +665,9 @@ mm-dissect-buffer
(setq type (split-string (car ctl) "/"))
(setq subtype (cadr type)
type (car type))
+ ;; Fix missing charset in Thunderbird
+ (unless (assq 'charset (cdr ctl))
+ (push '(charset . utf-8) (cdr ctl)))
(setq
result
(cond