GNU bug report logs - #45925
27.1; *Summary* buffer vs. raw utf-8 headers

Previous Next

Packages: emacs, gnus;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sun, 17 Jan 2021 05:37:02 UTC

Severity: minor

Tags: fixed

Found in version 27.1

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 45925 in the body.
You can then email your comments to 45925 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Sun, 17 Jan 2021 05:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org. (Sun, 17 Jan 2021 05:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Sun, 17 Jan 2021 13:35:57 +0800
Try this simple experiment:
$ echo Subject: 一二三|procmail
$ echo Subject: 一二三|iconv -t big5|procmail
$ emacs -f gnus

In the *Article* buffer, both look like
Subject: 一二三
In the *Summary* buffer so does the big5 version.
Alas, the utf-8 version looks like
c\x80\xd3....

(Yes, these are illegal raw headers. But Gnus is supposed to be
accommodating. And it does... but oddly not for the majority (UTF-8) case.)

Important settings:
  value of $LC_COLLATE: C
  value of $LC_CTYPE: zh_TW.UTF-8
  value of $LC_MESSAGES: C
  value of $LANG: zh_TW.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

(Might be related to bug#45724.)

(https://www.jidanni.org/comp/configuration/ has my dot files. )




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Tue, 19 Jan 2021 01:05:02 GMT) Full text and rfc822 format available.

Message #8 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: 45925 <at> debbugs.gnu.org, 45724 <at> debbugs.gnu.org
Subject: Happens since 27.1
Date: Tue, 19 Jan 2021 09:04:16 +0800
Note with emacs-version "27.1"  these certain old messages that have been
sitting in my *Summary* buffer for years suddenly have got their Subject
garbled. (Fine though in *Article* buffer.)




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Tue, 19 Jan 2021 05:31:02 GMT) Full text and rfc822 format available.

Message #11 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 45925 <at> debbugs.gnu.org
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Tue, 19 Jan 2021 06:30:44 +0100
積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:

> Try this simple experiment:
> $ echo Subject: 一二三|procmail
> $ echo Subject: 一二三|iconv -t big5|procmail

I don't have procmail installed, so I'm not sure what these do -- are
you sending a mail (to yourself?) here?  Do you have a recipe to
reproduce this problem without the use of procmail?

> $ emacs -f gnus
>
> In the *Article* buffer, both look like
> Subject: 一二三
> In the *Summary* buffer so does the big5 version.
> Alas, the utf-8 version looks like
> c\x80\xd3....
>
> (Yes, these are illegal raw headers. But Gnus is supposed to be
> accommodating. And it does... but oddly not for the majority (UTF-8) case.)

[...]

> (Might be related to bug#45724.)

Is this still with nnml?  If so, could you find the resulting lines in
the .overview files in the nnml directory and post them here?  (Perhaps
after gzipping them to avoid Emacs helpfully re-encoding the lines.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Wed, 20 Jan 2021 06:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 45925 <at> debbugs.gnu.org
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Wed, 20 Jan 2021 13:58:08 +0800
>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
LI> 積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:

>> Try this simple experiment:
>> $ echo Subject: 一二三|procmail
>> $ echo Subject: 一二三|iconv -t big5|procmail

LI> I don't have procmail installed, so I'm not sure what these do -- are
LI> you sending a mail (to yourself?) here?  Do you have a recipe to
LI> reproduce this problem without the use of procmail?

$ echo Subject: 一二三 > ~/Maildir/new/Z
$ file ~/Maildir/new/Z
~/Maildir/new/Z: UTF-8 Unicode text


>> $ emacs -f gnus
>> 
>> In the *Article* buffer, both look like
>> Subject: 一二三
>> In the *Summary* buffer so does the big5 version.
>> Alas, the utf-8 version looks like
>> c\x80\xd3....
>> 
>> (Yes, these are illegal raw headers. But Gnus is supposed to be
>> accommodating. And it does... but oddly not for the majority (UTF-8) case.)

LI> [...]

>> (Might be related to bug#45724.)

LI> Is this still with nnml?  If so, could you find the resulting lines in
LI> the .overview files in the nnml directory and post them here?  (Perhaps
LI> after gzipping them to avoid Emacs helpfully re-encoding the lines.)

Yes, nnml.

The headers get appended raw to .overview.

Thus .overview contains a mix of ASCII, big5, and UTF-8, all in the same file.

$ echo Subject: 一二三|iconv -t big5 > ~/Maildir/new/B5
$ echo Subject: 一二三 > ~/Maildir/new/UT
$ emacs -f gnus
$ tail -n 2 Mail/mail/misc/.overview|qprint -e
37397   =A4@=A4G=A4T    (nobody)                <87a6t4gnpx.5.fsf <at> totally-fudged-out-mess=
age-id>         0       0       Xref: jidanni5 mail.misc:37397=09
37398   =E4=B8=80=E4=BA=8C=E4=B8=89     (nobody)                <878s8ognpx.5.fsf <at> totally-=
fudged-out-message-id>          0       0       Xref: jidanni5 mail.misc:37398=09

Anyway: *Summary* oddly can only deal with raw big5, not raw UTF-8.
However *Article* can deal with both.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Wed, 20 Jan 2021 16:33:01 GMT) Full text and rfc822 format available.

Message #17 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 45925 <at> debbugs.gnu.org
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Wed, 20 Jan 2021 17:31:58 +0100
積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:

> LI> I don't have procmail installed, so I'm not sure what these do -- are
> LI> you sending a mail (to yourself?) here?  Do you have a recipe to
> LI> reproduce this problem without the use of procmail?
>
> $ echo Subject: 一二三 > ~/Maildir/new/Z
> $ file ~/Maildir/new/Z
> ~/Maildir/new/Z: UTF-8 Unicode text

I thought this was about nnml?  Is ~/Maildir/new/Z your nnml directory?

> LI> Is this still with nnml?  If so, could you find the resulting lines in
> LI> the .overview files in the nnml directory and post them here?  (Perhaps
> LI> after gzipping them to avoid Emacs helpfully re-encoding the lines.)
>
> Yes, nnml.
>
> The headers get appended raw to .overview.
>
> Thus .overview contains a mix of ASCII, big5, and UTF-8, all in the same file.
>
> $ echo Subject: 一二三|iconv -t big5 > ~/Maildir/new/B5
> $ echo Subject: 一二三 > ~/Maildir/new/UT
> $ emacs -f gnus
> $ tail -n 2 Mail/mail/misc/.overview|qprint -e
> 37397 =A4@=A4G=A4T (nobody) <87a6t4gnpx.5.fsf <at> totally-fudged-out-mess=
> age-id>         0       0       Xref: jidanni5 mail.misc:37397=09
> 37398 =E4=B8=80=E4=BA=8C=E4=B8=89 (nobody) <878s8ognpx.5.fsf <at> totally-=
> fudged-out-message-id> 0 0 Xref: jidanni5 mail.misc:37398=09

There was just ASCII in the part you posted.  Could you gzip it, as I
asked you to?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Thu, 21 Jan 2021 20:11:02 GMT) Full text and rfc822 format available.

Message #20 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 45925 <at> debbugs.gnu.org
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Fri, 22 Jan 2021 03:55:44 +0800
[Message part 1 (text/plain, inline)]
LI> I thought this was about nnml?  Is ~/Maildir/new/Z your nnml directory?

No. I just made a file where gnus gets its mail from when I hit "g".
g runs the command gnus-group-get-new-news

Anyway, all you need to do to reproduce this bug, is to have somebody
send you a mail with raw UTF-8 in the Subject header.

LI> There was just ASCII in the part you posted.  Could you gzip it, as I
LI> asked you to?

$ perl -nwle 'print if /\P{ASCII}/' Mail/mail/misc/.overview > /tmp/h
$ gzip /tmp/h
[h.gz (application/gzip, attachment)]
[Message part 3 (text/plain, inline)]
Here you will see a mix of raw UTF-8, raw big5, all in the same file.
The raw big5 works fine, but the raw UTF-8 looks garbled, in the summary
buffer. In the article buffer, all look fine.

Here are all my config files:
https://www.jidanni.org/comp/configuration/.emacs
https://www.jidanni.org/comp/configuration/.gnus.el
https://www.jidanni.org/comp/configuration/.emacs-custom.el
https://www.jidanni.org/comp/configuration/.emacs-w3m

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Thu, 21 Jan 2021 20:24:02 GMT) Full text and rfc822 format available.

Message #23 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: larsi <at> gnus.org, 45925 <at> debbugs.gnu.org
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Thu, 21 Jan 2021 22:22:56 +0200
> From: 積丹尼 Dan Jacobson
>  <jidanni <at> jidanni.org>
> Date: Fri, 22 Jan 2021 03:55:44 +0800
> Cc: 45925 <at> debbugs.gnu.org
> 
> Here you will see a mix of raw UTF-8, raw big5, all in the same file.
> The raw big5 works fine, but the raw UTF-8 looks garbled, in the summary
> buffer. In the article buffer, all look fine.

Why do you expect a mixed-encoding stuff to work in Emacs?  Emacs only
supports a single encoding of any chunk of text it gets, be it a file
or an email message.

Files such as this one are simply not supported.




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Thu, 21 Jan 2021 20:55:01 GMT) Full text and rfc822 format available.

Message #26 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, 45925 <at> debbugs.gnu.org
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Fri, 22 Jan 2021 04:54:17 +0800
>>>>> "EZ" == Eli Zaretskii <eliz <at> gnu.org> writes:
EZ> Why do you expect a mixed-encoding stuff to work in Emacs?  Emacs only
EZ> supports a single encoding of any chunk of text it gets, be it a file
EZ> or an email message.

EZ> Files such as this one are simply not supported.

So, Gnus should not just randomly slap raw lines into the same file.
That is the root of all problems!




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#45925; Package emacs,gnus. (Fri, 22 Jan 2021 18:07:02 GMT) Full text and rfc822 format available.

Message #29 received at 45925 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 45925 <at> debbugs.gnu.org,
 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Subject: Re: bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
Date: Fri, 22 Jan 2021 19:06:00 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> Why do you expect a mixed-encoding stuff to work in Emacs?  Emacs only
> supports a single encoding of any chunk of text it gets, be it a file
> or an email message.
>
> Files such as this one are simply not supported.

Sure they are.  It's not a text file; it's an octet stream.

But as Dan points out, Gnus doesn't handle these invalid mails
optimally, and doing some RFC2047-encoding to the headers before writing
the .overview file will help a bit here, so I've now done that in Emacs
28.

(Gnus will still display some of these headers "wrong" in the summary
buffer, and display them "right" in the article buffer, because Gnus has
to guess at what the charset is, and it does further guessing in the
article buffer than in the summary buffer, for reasons of efficiency.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) fixed. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 22 Jan 2021 18:09:01 GMT) Full text and rfc822 format available.

bug marked as fixed in version 28.1, send any further explanations to 45925 <at> debbugs.gnu.org and 積丹尼 Dan Jacobson <jidanni <at> jidanni.org> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 22 Jan 2021 18:09:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 20 Feb 2021 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 66 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.