GNU bug report logs - #35507
Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird

Previous Next

Packages: emacs, gnus;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Tue, 30 Apr 2019 19:22:02 UTC

Severity: minor

Tags: fixed

Found in version 27

Done: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35507 in the body.
You can then email your comments to 35507 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Tue, 30 Apr 2019 19:22:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org. (Tue, 30 Apr 2019 19:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: submit <at> debbugs.gnu.org
Subject: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
Date: Tue, 30 Apr 2019 12:20:58 -0700
[Message part 1 (text/plain, inline)]
Package: emacs,gnus
Version: 27

When I send email from Thunderbird with a patch attachment, Thunderbird
puts something like the following into the email:

  --------------AA6C74B60F40E0D600CCD03A
  Content-Type: text/x-patch;
   name="0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch"
  Content-Transfer-Encoding: 8bit
  Content-Disposition: attachment;
   filename*0="0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch"

  From 325f51c84d9ad4d9776784bd324b347ffe4fe51b Mon Sep 17 00:00:00 2001
  From: Paul Eggert <eggert <at> cs.ucla.edu>
  Date: Tue, 30 Apr 2019 10:45:48 -0700
  Subject: [PATCH] Fix decode-time/encode-time roundtrip on macOS
  MIME-Version: 1.0
  Content-Type: text/plain; charset=UTF-8
  Content-Transfer-Encoding: 8bit

  * src/timefns.c (Fencode_time): Ignore DST flag when the zone is
  ...

The attachment has a text/* media type but it has no charset parameter.
The patch itself (output by git format-patch) says its charset is UTF-8.
Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
mishandles the non-ASCII characters in the attachment. To reproduce the
problem, read this email with Gnus; the full attachment is attached to
this email in the Thunderbird way.

Although Internet RFC 2046 section 4.1.2 says the default charset for
text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
to say that registered text/* media types should require a charset
specification (or should say it's not needed because the payload has
that info, which obviously doesn't apply here). It later says that if
there is a strong reason to have a charset default, the default should
be UTF-8.

Unfortunately Gnus apparently doesn't default to UTF-8 for such
attachments, which means that sending a text/x-patch attachment from
Thunderbird to Gnus messes up if the attachment contains non-ASCII
characters. This has been causing problems on the Emacs mailing list for
years and it bit a correspondent of mine again today; see
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.

I have filed a Thunderbird bug report for this, as Thunderbird should
specify a charset; see
<https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
should be a polite citizen and handle these attachments nicely rather
than converting the non-ASCII UTF-8 characters to mojibake.

[0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 00:36:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 01:35:09 +0100
On Tue 30 Apr 2019, Paul Eggert wrote:

> The attachment has a text/* media type but it has no charset parameter.
> The patch itself (output by git format-patch) says its charset is UTF-8.
> Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
> mishandles the non-ASCII characters in the attachment. To reproduce the
> problem, read this email with Gnus; the full attachment is attached to
> this email in the Thunderbird way.
>
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.
>
> Unfortunately Gnus apparently doesn't default to UTF-8 for such
> attachments, which means that sending a text/x-patch attachment from
> Thunderbird to Gnus messes up if the attachment contains non-ASCII
> characters. This has been causing problems on the Emacs mailing list for
> years and it bit a correspondent of mine again today; see
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.
>
> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.

After a bit of experimenting, this minimal patch appears to fix things.
Should this also allow the user to choose the charset if none is
specified, or just hardwire it to utf-8 ?

diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
index 3f255419e7..a99d52a7e7 100644
--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -665,6 +665,9 @@ mm-dissect-buffer
 	(setq type (split-string (car ctl) "/"))
 	(setq subtype (cadr type)
 	      type (car type))
+        ;; Fix missing charset in Thunderbird
+        (unless (assq 'charset (cdr ctl))
+          (push '(charset . utf-8) (cdr ctl)))
 	(setq
 	 result
 	 (cond





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 15:23:01 GMT) Full text and rfc822 format available.

Message #11 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 17:22:19 +0200
>>>>> On Wed, 01 May 2019 01:35:09 +0100, Andy Moreton <andrewjmoreton <at> gmail.com> said:
    Andy> After a bit of experimenting, this minimal patch appears to
    Andy> fix things.  Should this also allow the user to choose the
    Andy> charset if none is specified, or just hardwire it to utf-8 ?

I think utf-8 is a good fallback if the message doesnʼt specify a
charset. Itʼs not going to produce any worse effects than what we have
now.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 15:47:01 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 16:45:28 +0100
On Wed 01 May 2019, Robert Pluim wrote:

>>>>>> On Wed, 01 May 2019 01:35:09 +0100, Andy Moreton <andrewjmoreton <at> gmail.com> said:
>     Andy> After a bit of experimenting, this minimal patch appears to
>     Andy> fix things.  Should this also allow the user to choose the
>     Andy> charset if none is specified, or just hardwire it to utf-8 ?
>
> I think utf-8 is a good fallback if the message doesnʼt specify a
> charset. Itʼs not going to produce any worse effects than what we have
> now.

Looking at this a bit more, the " *mm*" temp buffers produced when
decoding the MIME parts all seems to have the right coding, so my
previous patch looks wrong.

The problem may be in `mm-display-inline-fontify' when it tries to
choose a charset or coding system to display the MIME part inline.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 16:43:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 17:42:18 +0100
On Wed 01 May 2019, Andy Moreton wrote:

> On Wed 01 May 2019, Robert Pluim wrote:
>
>>>>>>> On Wed, 01 May 2019 01:35:09 +0100, Andy Moreton <andrewjmoreton <at> gmail.com> said:
>>     Andy> After a bit of experimenting, this minimal patch appears to
>>     Andy> fix things.  Should this also allow the user to choose the
>>     Andy> charset if none is specified, or just hardwire it to utf-8 ?
>>
>> I think utf-8 is a good fallback if the message doesnʼt specify a
>> charset. Itʼs not going to produce any worse effects than what we have
>> now.
>
> Looking at this a bit more, the " *mm*" temp buffers produced when
> decoding the MIME parts all seems to have the right coding, so my
> previous patch looks wrong.
>
> The problem may be in `mm-display-inline-fontify' when it tries to
> choose a charset or coding system to display the MIME part inline.

This patch only affects display, so should be safer:

diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
index 1e1d264b99..173ebfab48 100644
--- a/lisp/gnus/mm-view.el
+++ b/lisp/gnus/mm-view.el
@@ -475,7 +475,7 @@ mm-display-inline-fontify
 		    (charset
 		     (mm-decode-string text charset))
 		    (t
-		     text)))
+		     (mm-decode-string text 'utf-8))))
       (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
 	    (enable-local-variables nil))
         ;; We used to set font-lock-mode-hook to nil to avoid enabling





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 17:33:02 GMT) Full text and rfc822 format available.

Message #20 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 20:32:22 +0300
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 30 Apr 2019 12:20:58 -0700
> 
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.

(You meant RFC 6657, I believe.)

That's not exactly my reading of the RFC language.  First, it sounds
like the text there is primarily intended for the sending MUA, not for
the receiving MUA.  And second, this text:

     In order to improve interoperability with deployed agents, "text/*"
     media type registrations SHOULD either

     a.  specify that the "charset" parameter is not used for the defined
	 subtype, because the charset information is transported inside
	 the payload (such as in "text/xml"), or

     b.  require explicit unconditional inclusion of the "charset"
	 parameter, eliminating the need for a default value.

     In accordance with option (a) above, registrations for "text/*" media
     types that can transport charset information inside the corresponding
     payloads (such as "text/html" and "text/xml") SHOULD NOT specify the
     use of a "charset" parameter, nor any default value, in order to
     avoid conflicting interpretations should the "charset" parameter
     value and the value specified in the payload disagree.

     Thus, new subtypes of the "text" media type SHOULD NOT define a
     default "charset" value.  If there is a strong reason to do so
     despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as
     the default.

     Regardless of what approach is chosen, all new "text/*" registrations
     MUST clearly specify how the charset is determined; relying on the
     default defined in Section 4.1.2 of [RFC2046] is no longer permitted.
     However, existing "text/*" registrations that fail to specify how the
     charset is determined still default to US-ASCII.

seems to say that:

  . it is preferable, for new types of text/* media, not to have any
    default charset, unless there's a strong reason to the contrary

  . all new text/* registrations must specify how the charset is
    determined, and not rely on the default from RFC 2046

Is text/x-patch a "new media type" or not?  If it is not new, then
where is it defined?  I couldn't find it on the IANA site.

If it _is_ "new", my reading of the RFC is that we should not define
or expect any defaults, which means this bug is squarely in
Thunderbird's yard, and we shouldn't change Gnus to arbitrarily assume
UTF-8 as the default.

> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.

Does Gnus have a command to re-decode an already decoded MIME part?
If not, it should.  But other than that, I don't see why we should
change Gnus in this regard, certainly not unconditionally assuming
UTF-8.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 17:34:02 GMT) Full text and rfc822 format available.

Message #23 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 20:33:31 +0300
> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> Date: Wed, 01 May 2019 01:35:09 +0100
> 
> After a bit of experimenting, this minimal patch appears to fix things.
> Should this also allow the user to choose the charset if none is
> specified, or just hardwire it to utf-8 ?
> 
> diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
> index 3f255419e7..a99d52a7e7 100644
> --- a/lisp/gnus/mm-decode.el
> +++ b/lisp/gnus/mm-decode.el
> @@ -665,6 +665,9 @@ mm-dissect-buffer
>  	(setq type (split-string (car ctl) "/"))
>  	(setq subtype (cadr type)
>  	      type (car type))
> +        ;; Fix missing charset in Thunderbird
> +        (unless (assq 'charset (cdr ctl))
> +          (push '(charset . utf-8) (cdr ctl)))

Please don't unconditionally force UTF-8 on users.  At the very least
this should be a user option, if at all.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 17:36:01 GMT) Full text and rfc822 format available.

Message #26 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 20:34:46 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Date: Wed, 01 May 2019 17:22:19 +0200
> Cc: 35507 <at> debbugs.gnu.org
> 
> I think utf-8 is a good fallback if the message doesnʼt specify a
> charset. Itʼs not going to produce any worse effects than what we have
> now.

What considerations led you to that conclusion?




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 17:38:02 GMT) Full text and rfc822 format available.

Message #29 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 20:36:41 +0300
> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> Date: Wed, 01 May 2019 17:42:18 +0100
> 
> +		     (mm-decode-string text 'utf-8))))

As I said, I'm not sure we should do this, let alone unconditionally
force UTF-8 here, but if we must, why not use decode-coding-string?
Do we really need the mm-* stuff?




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 18:27:02 GMT) Full text and rfc822 format available.

Message #32 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 1 May 2019 11:26:35 -0700
On 5/1/19 10:32 AM, Eli Zaretskii wrote:
> Is text/x-patch a "new media type" or not? 

It's not a registered media type so strictly speaking the RFCs' SHOULD
statements do not apply (and they are SHOULDs not MUSTs so they could be
disregarded for good reason). That being said, the ordinary and usual
intent is for the x- media types to follow these recommendations and my
bug report was filed under that assumption.

> my reading of the RFC is that we should not define
> or expect any defaults, which means this bug is squarely in
> Thunderbird's yard

Ah, sorry, I see that my bug report misstated a point. This particular
patch clearly identifies its own encoding because its header says
"Content-Type: text/plain; charset=UTF-8". (I think Git-generated
patches always specify an encoding unless it's ASCII.) So in this
particular case the RFC's recommendation seems to be respected by the
sender.

Gnus could look for a Content-Type: header in text bodies that do not
specify charsets; this would follow the Internet's robustness principle
better.

> I don't see why we should
> change Gnus in this regard, certainly not unconditionally assuming
> UTF-8.
Gnus is mishandling emails sent from Thunderbird right now, so it would
be a practical benefit for Gnus users if it did a better job of decoding
these admittedly-iffy messages.

These days, UTF-8 is by far the most common encoding specified for
non-ASCII text in email and its popularity is growing, so it's the best
choice for a default if Gnus will have one - certainly better than the
confusing behavior that Robert Pluim observed in his Gnus session.
Gnus's current behavior may have been a good idea in 1996 when RFC 2046
said US-ASCII was the default, but it stopped being a good idea in 2012
when RFC 6657 came out and said that UTF-8 should be the default if
there is a default.

Another possibility is that Gnus could ask the user which encoding to
use when the email headers don't specify one and when the text is not
ASCII; even that would be better than Gnus's current behavior of forcing
US-ASCII and displaying something like "\xe2\x80\x99" when it encounters
a non-ASCII character.





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 19:07:02 GMT) Full text and rfc822 format available.

Message #35 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 22:05:59 +0300
> Cc: 35507 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 1 May 2019 11:26:35 -0700
> 
> > I don't see why we should
> > change Gnus in this regard, certainly not unconditionally assuming
> > UTF-8.
> Gnus is mishandling emails sent from Thunderbird right now, so it would
> be a practical benefit for Gnus users if it did a better job of decoding
> these admittedly-iffy messages.
> 
> These days, UTF-8 is by far the most common encoding specified for
> non-ASCII text in email and its popularity is growing, so it's the best
> choice for a default if Gnus will have one - certainly better than the
> confusing behavior that Robert Pluim observed in his Gnus session.
> Gnus's current behavior may have been a good idea in 1996 when RFC 2046
> said US-ASCII was the default, but it stopped being a good idea in 2012
> when RFC 6657 came out and said that UTF-8 should be the default if
> there is a default.
> 
> Another possibility is that Gnus could ask the user which encoding to
> use when the email headers don't specify one and when the text is not
> ASCII; even that would be better than Gnus's current behavior of forcing
> US-ASCII and displaying something like "\xe2\x80\x99" when it encounters
> a non-ASCII character.

I'm okay with having a default that's customizable.  I also think Gnus
should have a feature that allows the user to request "re-decoding" of
a message part, because no matter how smart are we and our defaults,
they will sometimes fail.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 19:49:02 GMT) Full text and rfc822 format available.

Message #38 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 21:47:58 +0200
On Mai 01 2019, Eli Zaretskii <eliz <at> gnu.org> wrote:

> I'm okay with having a default that's customizable.  I also think Gnus
> should have a feature that allows the user to request "re-decoding" of
> a message part, because no matter how smart are we and our defaults,
> they will sometimes fail.

That already exists (K C runs the command
gnus-article-view-part-as-charset).

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 19:59:01 GMT) Full text and rfc822 format available.

Message #41 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 35507 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 22:57:46 +0300
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>,  35507 <at> debbugs.gnu.org
> Date: Wed, 01 May 2019 21:47:58 +0200
> 
> On Mai 01 2019, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> > I'm okay with having a default that's customizable.  I also think Gnus
> > should have a feature that allows the user to request "re-decoding" of
> > a message part, because no matter how smart are we and our defaults,
> > they will sometimes fail.
> 
> That already exists (K C runs the command
> gnus-article-view-part-as-charset).

Great, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Wed, 01 May 2019 23:56:02 GMT) Full text and rfc822 format available.

Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Thu, 02 May 2019 00:54:56 +0100
On Wed 01 May 2019, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
>> Date: Wed, 01 May 2019 17:42:18 +0100
>> 
>> +		     (mm-decode-string text 'utf-8))))
>
> As I said, I'm not sure we should do this, let alone unconditionally
> force UTF-8 here, but if we must, why not use decode-coding-string?
> Do we really need the mm-* stuff?

No idea - I am not at all expert in coding systems or the internals of
Gnus.

This was the simplest patch that appeared to work for producing the
right display, without changing the decode into the " *mm*" prefixed
temp buffers created by the MIME machinery for each part.

If you think `decode-coding-string' is a better patch, feel free to test
and commit that instead.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 03:08:01 GMT) Full text and rfc822 format available.

Message #47 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, Andy Moreton <andrewjmoreton <at> gmail.com>
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 01 May 2019 23:07:32 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
>> Date: Wed, 01 May 2019 17:42:18 +0100
>> 
>> +		     (mm-decode-string text 'utf-8))))
>
> As I said, I'm not sure we should do this, let alone unconditionally
> force UTF-8 here, but if we must, why not use decode-coding-string?
> Do we really need the mm-* stuff?

As far as I can tell, the mm-* version is useful for handling stuff lke
"UTF-8" as the charset argument (which might be useful if we extract it
from the "Content-Type: text/plain; charset=UTF-8" header).  If passing
'utf-8, then it's just the same as calling decode-coding-string.

For a default if we don't find a charset header, I guess `undecided'
would make more sense, right?  After all, Emacs already has the coding
detection machinery, may as well use it.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 06:36:01 GMT) Full text and rfc822 format available.

Message #50 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Thu, 02 May 2019 08:35:27 +0200
>>>>> On Wed, 01 May 2019 20:34:46 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com> Date: Wed, 01 May 2019
    >> 17:22:19 +0200 Cc: 35507 <at> debbugs.gnu.org
    >> 
    >> I think utf-8 is a good fallback if the message doesnʼt specify
    >> a charset. Itʼs not going to produce any worse effects than
    >> what we have now.

    Eli> What considerations led you to that conclusion?

If the message requires a charset, gnus might produce
mojibake. Assuming utf-8 reduces the chance of that happening. Itʼs
true that in particular cases a different charset should be used, but
in that case the existing assumption of ASCII is wrong as well.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 07:19:02 GMT) Full text and rfc822 format available.

Message #53 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Thu, 02 May 2019 08:17:51 +0100
On Wed 01 May 2019, Noam Postavsky wrote:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
>>> Date: Wed, 01 May 2019 17:42:18 +0100
>>> 
>>> +		     (mm-decode-string text 'utf-8))))
>>
>> As I said, I'm not sure we should do this, let alone unconditionally
>> force UTF-8 here, but if we must, why not use decode-coding-string?
>> Do we really need the mm-* stuff?
>
> As far as I can tell, the mm-* version is useful for handling stuff lke
> "UTF-8" as the charset argument (which might be useful if we extract it
> from the "Content-Type: text/plain; charset=UTF-8" header).  If passing
> 'utf-8, then it's just the same as calling decode-coding-string.

OK, in that case we could indeed just call decode-coding-string.

> For a default if we don't find a charset header, I guess `undecided'
> would make more sense, right?  After all, Emacs already has the coding
> detection machinery, may as well use it.

Please re-read the original bug report: the problem is with malformed
messages that do not contain a charset field in the Content-Type header.

The one-liner patch changes the default for inline display in the
Gnus article buffer to assume UTF-8 when nothing is specified, rather
than just inserting the text without decoding it.

That should result in text that actually is UTF-8 being displayed
correctly, and no change to plain ASCII. For anything else, the user can
use the `gnus-mime-view-part-as-charset' command to override the
default.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 11:05:02 GMT) Full text and rfc822 format available.

Message #56 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bug-gnu-emacs <at> gnu.org, Andy Moreton <andrewjmoreton <at> gmail.com>,
 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 14:04:26 +0300
On May 2, 2019 10:17:51 AM GMT+03:00, Andy Moreton <andrewjmoreton <at> gmail.com> wrote:
> On Wed 01 May 2019, Noam Postavsky wrote:
> 
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> >>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> >>> Date: Wed, 01 May 2019 17:42:18 +0100
> >>> 
> >>> +		     (mm-decode-string text 'utf-8))))
> >>
> >> As I said, I'm not sure we should do this, let alone
> unconditionally
> >> force UTF-8 here, but if we must, why not use decode-coding-string?
> >> Do we really need the mm-* stuff?
> >
> > As far as I can tell, the mm-* version is useful for handling stuff
> lke
> > "UTF-8" as the charset argument (which might be useful if we extract
> it
> > from the "Content-Type: text/plain; charset=UTF-8" header).  If
> passing
> > 'utf-8, then it's just the same as calling decode-coding-string.
> 
> OK, in that case we could indeed just call decode-coding-string.
> 
> > For a default if we don't find a charset header, I guess `undecided'
> > would make more sense, right?  After all, Emacs already has the
> coding
> > detection machinery, may as well use it.
> 
> Please re-read the original bug report: the problem is with malformed
> messages that do not contain a charset field in the Content-Type
> header.
> 
> The one-liner patch changes the default for inline display in the
> Gnus article buffer to assume UTF-8 when nothing is specified, rather
> than just inserting the text without decoding it.
> 
> That should result in text that actually is UTF-8 being displayed
> correctly, and no change to plain ASCII. For anything else, the user
> can
> use the `gnus-mime-view-part-as-charset' command to override the
> default.
> 
>     AndyM

Using 'undecided' doesn't disable decoding, it just means Emacs will try to detect the correct encoding by looking at the text (not at the charset header).  In a UTF-8 locale, we will guess UTF-8 anyway, unless we see invalid sequences.

So yes, I think Noam is right, and 'undecided' is a better alternative here.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 11:05:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 12:02:02 GMT) Full text and rfc822 format available.

Message #62 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Thu, 02 May 2019 08:01:38 -0400
Andy Moreton <andrewjmoreton <at> gmail.com> writes:

> On Wed 01 May 2019, Noam Postavsky wrote:
>>
>> As far as I can tell, the mm-* version is useful for handling stuff lke
>> "UTF-8" as the charset argument (which might be useful if we extract it
>> from the "Content-Type: text/plain; charset=UTF-8" header).  If passing
>> 'utf-8, then it's just the same as calling decode-coding-string.
>
> OK, in that case we could indeed just call decode-coding-string.
>
>> For a default if we don't find a charset header, I guess `undecided'
>> would make more sense, right?  After all, Emacs already has the coding
>> detection machinery, may as well use it.
>
> Please re-read the original bug report: the problem is with malformed
> messages that do not contain a charset field in the Content-Type header.

I understood from Paul's followup in https://debbugs.gnu.org/35507#32
that the report is mainly about the case where there is a Content-Type
header with a charset field within the body of the attachment.





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 15:41:02 GMT) Full text and rfc822 format available.

Message #65 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Thu, 02 May 2019 18:40:24 +0300
> From: Noam Postavsky <npostavs <at> gmail.com>
> Date: Thu, 02 May 2019 08:01:38 -0400
> Cc: 35507 <at> debbugs.gnu.org
> 
> I understood from Paul's followup in https://debbugs.gnu.org/35507#32
> that the report is mainly about the case where there is a Content-Type
> header with a charset field within the body of the attachment.

Yes, that's my understanding as well.  So I guess Gnus should try
gleaning the charset from there.  The 'undecided' stuff is for when it
fails, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 15:44:02 GMT) Full text and rfc822 format available.

Message #68 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 16:43:31 +0100
On Thu 02 May 2019, Eli Zaretskii wrote:

> Using 'undecided' doesn't disable decoding, it just means Emacs will try to
> detect the correct encoding by looking at the text (not at the charset
> header). In a UTF-8 locale, we will guess UTF-8 anyway, unless we see invalid
> sequences.
>
> So yes, I think Noam is right, and 'undecided' is a better alternative here.

That is arguing for the existing code, which does not work correctly.

The problem is in `mm-display-inline-fontify'.

I am disinclined to look any further at this, as nobody else appears to
be running the existing code before commenting, or testing the proposed
patch.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 15:58:01 GMT) Full text and rfc822 format available.

Message #71 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 18:57:23 +0300
> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> Date: Thu, 02 May 2019 16:43:31 +0100
> 
> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
> 
> That is arguing for the existing code, which does not work correctly.

No, the existing code simply uses the undecoded string.

What I argue for is to do this:

diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
index 1e1d264b99..173ebfab48 100644
--- a/lisp/gnus/mm-view.el
+++ b/lisp/gnus/mm-view.el
@@ -475,7 +475,7 @@ mm-display-inline-fontify
 		    (charset
 		     (mm-decode-string text charset))
 		    (t
-		     text)))
+		     (mm-decode-string text 'undecided))))
       (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
 	    (enable-local-variables nil))
         ;; We used to set font-lock-mode-hook to nil to avoid enabling

> I am disinclined to look any further at this, as nobody else appears to
> be running the existing code before commenting, or testing the proposed
> patch.

Please don't be offended, there's no intent to offend you here.  Your
efforts are greatly appreciated.  We are just discussing a small
change to what you were proposing, see above.

Or are you saying that using undecided as above doesn't do the job?

(Sorry, I don't use Gnus, so to be able to reproduce the problem and
test a proposed solution I need detailed instructions, I cannot easily
do it myself without investing an inordinate amount of time.)




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 16:09:02 GMT) Full text and rfc822 format available.

Message #74 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 17:08:21 +0100
On Thu 02 May 2019, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
>> Date: Thu, 02 May 2019 16:43:31 +0100
>> 
>> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
>> 
>> That is arguing for the existing code, which does not work correctly.
>
> No, the existing code simply uses the undecoded string.
>
> What I argue for is to do this:
>
> diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
> index 1e1d264b99..173ebfab48 100644
> --- a/lisp/gnus/mm-view.el
> +++ b/lisp/gnus/mm-view.el
> @@ -475,7 +475,7 @@ mm-display-inline-fontify
>  		    (charset
>  		     (mm-decode-string text charset))
>  		    (t
> -		     text)))
> +		     (mm-decode-string text 'undecided))))
>        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
>  	    (enable-local-variables nil))
>          ;; We used to set font-lock-mode-hook to nil to avoid enabling

ok, that does appear to work for the example message in the original bug
report. Please push this change and we can find out if it causes any
other problems.

>> I am disinclined to look any further at this, as nobody else appears to
>> be running the existing code before commenting, or testing the proposed
>> patch.
>
> Please don't be offended, there's no intent to offend you here.  Your
> efforts are greatly appreciated.  We are just discussing a small
> change to what you were proposing, see above.

I'm not offended, but I did want to encourage others to run the code and
test the results before adding further commentary.

> Or are you saying that using undecided as above doesn't do the job?
>
> (Sorry, I don't use Gnus, so to be able to reproduce the problem and
> test a proposed solution I need detailed instructions, I cannot easily
> do it myself without investing an inordinate amount of time.)

The gnus-mock package on GNU ELPA may of some help for testing. However
I have not used it myself, nor investigated if it's collection of test
data contains a suitably malformed message.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 16:11:01 GMT) Full text and rfc822 format available.

Message #77 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, Andy Moreton <andrewjmoreton <at> gmail.com>
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 17:10:40 +0100
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
>> Date: Thu, 02 May 2019 16:43:31 +0100
>> 
>> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
>> 
>> That is arguing for the existing code, which does not work correctly.
>
> No, the existing code simply uses the undecoded string.
>
> What I argue for is to do this:
>
> diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
> index 1e1d264b99..173ebfab48 100644
> --- a/lisp/gnus/mm-view.el
> +++ b/lisp/gnus/mm-view.el
> @@ -475,7 +475,7 @@ mm-display-inline-fontify
>  		    (charset
>  		     (mm-decode-string text charset))
>  		    (t
> -		     text)))
> +		     (mm-decode-string text 'undecided))))
>        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
>  	    (enable-local-variables nil))
>          ;; We used to set font-lock-mode-hook to nil to avoid enabling
>
>> I am disinclined to look any further at this, as nobody else appears to
>> be running the existing code before commenting, or testing the proposed
>> patch.
>
> Please don't be offended, there's no intent to offend you here.  Your
> efforts are greatly appreciated.  We are just discussing a small
> change to what you were proposing, see above.
>
> Or are you saying that using undecided as above doesn't do the job?
>
> (Sorry, I don't use Gnus, so to be able to reproduce the problem and
> test a proposed solution I need detailed instructions, I cannot easily
> do it myself without investing an inordinate amount of time.)

FWIW, I use Gnus, and your suggested change to mm-display-inline-fontify
fixes the inline display of the patch in the OP for me.  BTW, the last
two cond branches can be merged following your change:

[mm-view.diff (text/x-diff, inline)]
diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
index 1e1d264b99..849488293a 100644
--- a/lisp/gnus/mm-view.el
+++ b/lisp/gnus/mm-view.el
@@ -472,10 +472,8 @@ mm-display-inline-fontify
 		       (buffer-string)))
 		    (coding-system
 		     (decode-coding-string text coding-system))
-		    (charset
-		     (mm-decode-string text charset))
-		    (t
-		     text)))
+                    (t
+                     (mm-decode-string text (or charset 'undecided)))))
       (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
 	    (enable-local-variables nil))
         ;; We used to set font-lock-mode-hook to nil to avoid enabling
[Message part 3 (text/plain, inline)]
Thanks,

-- 
Basil

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 16:31:01 GMT) Full text and rfc822 format available.

Message #80 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, Andy Moreton <andrewjmoreton <at> gmail.com>
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 18:29:52 +0200
>>>>> On Thu, 02 May 2019 18:57:23 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Andy Moreton <andrewjmoreton <at> gmail.com> Date: Thu, 02 May
    >> 2019 16:43:31 +0100
    >> 
    >> > So yes, I think Noam is right, and 'undecided' is a better
    >> alternative here.
    >> 
    >> That is arguing for the existing code, which does not work
    >> correctly.

    Eli> No, the existing code simply uses the undecoded string.

    Eli> What I argue for is to do this:

    >diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
    >index 1e1d264b99..173ebfab48 100644
    >--- a/lisp/gnus/mm-view.el
    >+++ b/lisp/gnus/mm-view.el
    >@@ -475,7 +475,7 @@ mm-display-inline-fontify
    > 		    (charset
    > 		     (mm-decode-string text charset))
    > 		    (t
    >- text)))
    >+ (mm-decode-string text 'undecided))))
    >       (let ((font-lock-verbose nil) ; font-lock is a bit too verbose.
    > 	    (enable-local-variables nil))
    >         ;; We used to set font-lock-mode-hook to nil to avoid enabling

That fixes things for me, thanks (I tested against Paul's original
message).

I donʼt see any need for it to be configurable, but thatʼs up to you.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 16:51:02 GMT) Full text and rfc822 format available.

Message #83 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 19:50:11 +0300
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: Andy Moreton <andrewjmoreton <at> gmail.com>,  <35507 <at> debbugs.gnu.org>
> Date: Thu, 02 May 2019 17:10:40 +0100
> 
> FWIW, I use Gnus, and your suggested change to mm-display-inline-fontify
> fixes the inline display of the patch in the OP for me.  BTW, the last
> two cond branches can be merged following your change:

Thanks.  Would you please push that, and give credit to Noam for the
suggestion?




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 16:52:01 GMT) Full text and rfc822 format available.

Message #86 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 19:50:58 +0300
> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> Date: Thu, 02 May 2019 17:08:21 +0100
> 
> > diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
> > index 1e1d264b99..173ebfab48 100644
> > --- a/lisp/gnus/mm-view.el
> > +++ b/lisp/gnus/mm-view.el
> > @@ -475,7 +475,7 @@ mm-display-inline-fontify
> >  		    (charset
> >  		     (mm-decode-string text charset))
> >  		    (t
> > -		     text)))
> > +		     (mm-decode-string text 'undecided))))
> >        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
> >  	    (enable-local-variables nil))
> >          ;; We used to set font-lock-mode-hook to nil to avoid enabling
> 
> ok, that does appear to work for the example message in the original bug
> report.

Thanks for testing.  Something according those lines will be in the
repository shortly.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 16:54:02 GMT) Full text and rfc822 format available.

Message #89 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 19:53:31 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Andy Moreton <andrewjmoreton <at> gmail.com>,  35507 <at> debbugs.gnu.org
> Date: Thu, 02 May 2019 18:29:52 +0200
> 
> That fixes things for me, thanks (I tested against Paul's original
> message).

Thanks for testing.

> I donʼt see any need for it to be configurable, but thatʼs up to you.

No need, IMO.  That's a nice bonus of Noam's idea: 'undecided' can
already be configured via existing facilities, like
prefer-coding-system, set-language-environment, and their ilk.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 17:14:02 GMT) Full text and rfc822 format available.

Message #92 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 10:13:07 -0700
Andy Moreton <andrewjmoreton <at> gmail.com> writes:

> On Thu 02 May 2019, Eli Zaretskii wrote:
>
>>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
>>> Date: Thu, 02 May 2019 16:43:31 +0100
>>> 
>>> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
>>> 
>>> That is arguing for the existing code, which does not work correctly.
>>
>> No, the existing code simply uses the undecoded string.
>>
>> What I argue for is to do this:
>>
>> diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
>> index 1e1d264b99..173ebfab48 100644
>> --- a/lisp/gnus/mm-view.el
>> +++ b/lisp/gnus/mm-view.el
>> @@ -475,7 +475,7 @@ mm-display-inline-fontify
>>  		    (charset
>>  		     (mm-decode-string text charset))
>>  		    (t
>> -		     text)))
>> +		     (mm-decode-string text 'undecided))))
>>        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
>>  	    (enable-local-variables nil))
>>          ;; We used to set font-lock-mode-hook to nil to avoid enabling
>
> ok, that does appear to work for the example message in the original bug
> report. Please push this change and we can find out if it causes any
> other problems.
>
>>> I am disinclined to look any further at this, as nobody else appears to
>>> be running the existing code before commenting, or testing the proposed
>>> patch.
>>
>> Please don't be offended, there's no intent to offend you here.  Your
>> efforts are greatly appreciated.  We are just discussing a small
>> change to what you were proposing, see above.
>
> I'm not offended, but I did want to encourage others to run the code and
> test the results before adding further commentary.
>
>> Or are you saying that using undecided as above doesn't do the job?
>>
>> (Sorry, I don't use Gnus, so to be able to reproduce the problem and
>> test a proposed solution I need detailed instructions, I cannot easily
>> do it myself without investing an inordinate amount of time.)
>
> The gnus-mock package on GNU ELPA may of some help for testing. However
> I have not used it myself, nor investigated if it's collection of test
> data contains a suitably malformed message.

It doesn't currently, but this is a perfect use-case for the package.
Shall I just add the up-thread message into the test data? Or can we
come up with a more-broken version of the message?




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 17:22:02 GMT) Full text and rfc822 format available.

Message #95 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: contovob <at> tcd.ie
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 20:20:49 +0300
> Date: Thu, 02 May 2019 19:50:11 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
> 
> give credit to Noam

And to Andy, of course.  Sorry, thought it was obvious, but better
safe than sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 17:46:01 GMT) Full text and rfc822 format available.

Message #98 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Thu, 02 May 2019 18:45:30 +0100
On Thu 02 May 2019, Eric Abrahamsen wrote:

> Andy Moreton <andrewjmoreton <at> gmail.com> writes:
>> The gnus-mock package on GNU ELPA may of some help for testing. However
>> I have not used it myself, nor investigated if it's collection of test
>> data contains a suitably malformed message.
>
> It doesn't currently, but this is a perfect use-case for the package.
> Shall I just add the up-thread message into the test data? Or can we
> come up with a more-broken version of the message?

Something similar, but any test data should be anonymised so that it
does not contain personal details or real email addresses.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Thu, 02 May 2019 23:25:01 GMT) Full text and rfc822 format available.

Message #101 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
Date: Thu, 2 May 2019 16:24:34 -0700
> any test data should be anonymised so that it
> does not contain personal details or real email addresses.
It's OK with me if you use my original bug report as test data, as I
think the only email addresses it contains are public ones like mine
(already nearly 3000 copies of that in the Emacs source code!) or
bug-gnu-emacs.

Thanks to all for fixing this.





Added tag(s) fixed. Request was from "Basil L. Contovounesios" <contovob <at> tcd.ie> to control <at> debbugs.gnu.org. (Fri, 03 May 2019 13:56:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 35507 <at> debbugs.gnu.org and Paul Eggert <eggert <at> cs.ucla.edu> Request was from "Basil L. Contovounesios" <contovob <at> tcd.ie> to control <at> debbugs.gnu.org. (Fri, 03 May 2019 13:56:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Fri, 03 May 2019 13:56:02 GMT) Full text and rfc822 format available.

Message #108 received at 35507-done <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507-done <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments
 from	Thunderbird
Date: Fri, 03 May 2019 14:55:37 +0100
tags 35507 fixed
close 35507
quit

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Thu, 02 May 2019 19:50:11 +0300
>> From: Eli Zaretskii <eliz <at> gnu.org>
>> Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
>> 
>> give credit to Noam
>
> And to Andy, of course.  Sorry, thought it was obvious, but better
> safe than sorry.

Done (hopefully without needing to be sorry):

[24a1d5a0b5]: Fix Gnus inline attachment decoding (bug#35507)
  2019-05-03 14:52:01 +0100
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=24a1d5a0b5c0debd8256d71242bfa6f8448bf5af

I am thus closing this report.

Thanks,

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Fri, 03 May 2019 14:03:01 GMT) Full text and rfc822 format available.

Message #111 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com,
 Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Fri, 03 May 2019 15:02:01 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Noam Postavsky <npostavs <at> gmail.com>
>> Date: Thu, 02 May 2019 08:01:38 -0400
>> Cc: 35507 <at> debbugs.gnu.org
>> 
>> I understood from Paul's followup in https://debbugs.gnu.org/35507#32
>> that the report is mainly about the case where there is a Content-Type
>> header with a charset field within the body of the attachment.
>
> Yes, that's my understanding as well.  So I guess Gnus should try
> gleaning the charset from there.  The 'undecided' stuff is for when it
> fails, I think.

Question following an initial reading of (info "(elisp) Coding System
Basics"): would it be better in this case to use prefer-utf-8 instead of
undecided?  If not, why not?

Thanks,

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Fri, 03 May 2019 15:16:01 GMT) Full text and rfc822 format available.

Message #114 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com, npostavs <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Fri, 03 May 2019 18:14:32 +0300
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: Noam Postavsky <npostavs <at> gmail.com>,  <35507 <at> debbugs.gnu.org>,  <andrewjmoreton <at> gmail.com>
> Date: Fri, 03 May 2019 15:02:01 +0100
> 
> Question following an initial reading of (info "(elisp) Coding System
> Basics"): would it be better in this case to use prefer-utf-8 instead of
> undecided?  If not, why not?

Because we have no reason to prefer UTF-8 in this case.  No one tells
us that x-patch will be predominantly encoded in UTF-8.

The RFC doesn't say that UTF-8 is the default, either, and
text/x-patch is not defined anywhere with that default.  Which means
there's no default, and in that case 'undecided' is better, because it
heeds to the preferences of the user.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#35507; Package emacs,gnus. (Fri, 03 May 2019 15:21:01 GMT) Full text and rfc822 format available.

Message #117 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com, npostavs <at> gmail.com
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Fri, 03 May 2019 16:20:19 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: Noam Postavsky <npostavs <at> gmail.com>,  <35507 <at> debbugs.gnu.org>,  <andrewjmoreton <at> gmail.com>
>> Date: Fri, 03 May 2019 15:02:01 +0100
>> 
>> Question following an initial reading of (info "(elisp) Coding System
>> Basics"): would it be better in this case to use prefer-utf-8 instead of
>> undecided?  If not, why not?
>
> Because we have no reason to prefer UTF-8 in this case.  No one tells
> us that x-patch will be predominantly encoded in UTF-8.
>
> The RFC doesn't say that UTF-8 is the default, either, and
> text/x-patch is not defined anywhere with that default.  Which means
> there's no default, and in that case 'undecided' is better, because it
> heeds to the preferences of the user.

Right, thanks for explaining.

-- 
Basil




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 01 Jun 2019 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 324 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.