GNU bug report logs - #44307
27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode

Previous Next

Packages: emacs, gnus;

Reported by: Thomas Schneider <qsx <at> chaotikum.eu>

Date: Thu, 29 Oct 2020 14:12:01 UTC

Severity: normal

Tags: fixed

Merged with 45657

Found in version 27.1

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 44307 in the body.
You can then email your comments to 44307 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 29 Oct 2020 14:12:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Thomas Schneider <qsx <at> chaotikum.eu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org. (Thu, 29 Oct 2020 14:12:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Thomas Schneider <qsx <at> chaotikum.eu>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.1; UTF-8 parts transferred as 8bit in multipart messages fail to
 decode
Date: Thu, 29 Oct 2020 15:09:14 +0100
Hi,

Gnus fails to display certain messages correctly.  So far, I have
identified the following conditions, though there might be other
constellations:

- The message is multipart/alternative
- The text/* part is UTF-8
- Content-Transfer-Encoding is 8bit

Any non-ASCII printable character (e. g., German umlauts) are
interpreted as something else (probably ISO 8859) in the HTML part, or
not interpreted at all and displayed as octal escapes in the plain part.

A working example can be found below – uuencoded to prevent any further
charset transfer issues.  It should display 'ääää', that is, four 'a'
with a trema above, in either part.  I’m unsure how to reproduce it
without a working Gnus setup.  I inserted the message in a nnmaildir
group to display it, so this way should definitely work.

begin 644 example.eml
M1G)O;3H <at> 97AA;7!L92`\97AA;7!L94!E>&%M<&QE+F]R9SX-"E1O.B!E>&%M
M<&QE(#QE>&%M<&QE0&5X86UP;&4N;W)G/@T*0V]N=&5N="U4>7!E.B!M=6QT
M:7!A<G0O86QT97)N871I=F4[(&)O=6YD87)Y/2(]/3T]/3T]/3T]/3T]/3TR
M.#<W,3DU,#<U.30V.3<T,C0V/3TB#0I$871E.B!4:'4L(#(Y($]C="`R,#(P
M(#$T.C0W.C4U("LP,3`P#0I-24U%+59E<G-I;VXZ(#$N,`T*4W5B:F5C=#H@
M=&5S=`T*#0HM+3T]/3T]/3T]/3T]/3T]/3(X-S<Q.34P-S4Y-#8Y-S0R-#8]
M/0T*0V]N=&5N="U4>7!E.B!T97AT+W!L86EN.R!C:&%R<V5T/2)U=&8M."(-
M"D-O;G1E;G0M5')A;G-F97(M16YC;V1I;F<Z(#AB:70-"@T*PZ3#I,.DPZ0-
M"@T*+2T]/3T]/3T]/3T]/3T]/3TR.#<W,3DU,#<U.30V.3<T,C0V/3T-"D-O
M;G1E;G0M5'EP93H@=&5X="]H=&UL.R!C:&%R<V5T/2)U=&8M."(-"D-O;G1E
M;G0M5')A;G-F97(M16YC;V1I;F<Z(#AB:70-"@T*/"%D;V-T>7!E(&AT;6P^
M/&AT;6P^/&AE860^/&UE=&$@:'1T<"UE<75I=CTB8V]N=&5N="UT>7!E(B!C
M;VYT96YT/2)T97AT+VAT;6P[(&-H87)S970]551&+3 <at> B/CPO:&5A9#X\8F]D
M>3[#I,.DPZ3#I#PO8F]D>3X\+VAT;6P^#0H-"BTM/3T]/3T]/3T]/3T]/3T]
9,C <at> W-S$Y-3`W-3DT-CDW-#(T-CT]+2T-"@``
`
end

Thanks for your time,
Thomas


In GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.22, cairo version 1.16.0)
 of 2020-09-01 built on localhost
Windowing system distributor 'The X.Org Foundation', version 11.0.12008000
System Description: Gentoo/Linux

Recent messages:
Checking new news...done
Mark set
A bookmark has been added to the current article. [2 times]
Moved to bookmark [2 times]
Auto-saving...done

Configured using:
 'configure --prefix=/usr --build=x86_64-pc-linux-gnu
 --host=x86_64-pc-linux-gnu --mandir=/usr/share/man
 --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
 --localstatedir=/var/lib --disable-silent-rules
 --docdir=/usr/share/doc/emacs-27.1-r1
 --htmldir=/usr/share/doc/emacs-27.1-r1/html --libdir=/usr/lib64
 --program-suffix=-emacs-27 --includedir=/usr/include/emacs-27
 --infodir=/usr/share/info/emacs-27 --localstatedir=/var
 --enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp
 --without-compress-install --without-hesiod --without-pop
 --with-dumping=pdumper --with-file-notification=inotify --enable-acl
 --with-dbus --without-modules --without-gameuser --with-libgmp
 --with-gpm --without-json --with-kerberos --with-kerberos5 --with-lcms2
 --with-xml2 --without-mailutils --without-selinux --with-gnutls
 --with-libsystemd --with-threads --without-wide-int --with-zlib
 --with-sound=alsa --with-x --without-ns --without-gconf
 --with-gsettings --with-toolkit-scroll-bars --with-gif --with-jpeg
 --with-png --with-rsvg --with-tiff --with-xpm --with-imagemagick
 --with-xft --with-cairo --without-harfbuzz --with-libotf
 --with-m17n-flt --with-x-toolkit=gtk3 --without-xwidgets 'CFLAGS=-O2
 -pipe -march=native -g' CPPFLAGS= 'LDFLAGS=-Wl,-O1 -Wl,--as-needed''

Configured features:
XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY INOTIFY ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM THREADS LIBSYSTEMD PDUMPER LCMS2
GMP

Important settings:
  value of $LC_MESSAGES: en_GB.UTF-8
  value of $LC_TIME: en_DK.UTF-8
  value of $LANG: de_DE.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Group

Minor modes in effect:
  hl-line-mode: t
  cursor-sensor-mode: t
  gnus-undo-mode: t
  TeX-PDF-mode: t
  TeX-source-correlate-mode: t
  pdf-occur-global-minor-mode: t
  stripe-buffer-mode: t
  helm-mode: t
  helm-ff-cache-mode: t
  helm-autoresize-mode: t
  helm--remap-mouse-mode: t
  global-magit-file-mode: t
  magit-auto-revert-mode: t
  global-git-commit-mode: t
  async-bytecomp-package-mode: t
  shell-dirtrack-mode: t
  recentf-mode: t
  global-semanticdb-minor-mode: t
  global-semantic-idle-scheduler-mode: t
  semantic-mode: t
  display-battery-mode: t
  display-time-mode: t
  show-paren-mode: t
  override-global-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/home/qsx/.emacs.d/elpa/dpkg-dev-el-20190824.2314/debian-autoloads hides /home/qsx/.emacs.d/elpa/debian-el-20201011.1543/debian-autoloads
/home/qsx/.emacs.d/elpa/auctex-12.2.5/context hides /usr/share/emacs/site-lisp/auctex/context
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex hides /usr/share/emacs/site-lisp/auctex/tex
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-fold hides /usr/share/emacs/site-lisp/auctex/tex-fold
/home/qsx/.emacs.d/elpa/auctex-12.2.5/preview hides /usr/share/emacs/site-lisp/auctex/preview
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-font hides /usr/share/emacs/site-lisp/auctex/tex-font
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-ispell hides /usr/share/emacs/site-lisp/auctex/tex-ispell
/home/qsx/.emacs.d/elpa/auctex-12.2.5/latex-flymake hides /usr/share/emacs/site-lisp/auctex/latex-flymake
/home/qsx/.emacs.d/elpa/auctex-12.2.5/texmathp hides /usr/share/emacs/site-lisp/auctex/texmathp
/home/qsx/.emacs.d/elpa/auctex-12.2.5/context-en hides /usr/share/emacs/site-lisp/auctex/context-en
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-mik hides /usr/share/emacs/site-lisp/auctex/tex-mik
/home/qsx/.emacs.d/elpa/auctex-12.2.5/context-nl hides /usr/share/emacs/site-lisp/auctex/context-nl
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-info hides /usr/share/emacs/site-lisp/auctex/tex-info
/home/qsx/.emacs.d/elpa/auctex-12.2.5/multi-prompt hides /usr/share/emacs/site-lisp/auctex/multi-prompt
/home/qsx/.emacs.d/elpa/auctex-12.2.5/toolbar-x hides /usr/share/emacs/site-lisp/auctex/toolbar-x
/home/qsx/.emacs.d/elpa/auctex-12.2.5/font-latex hides /usr/share/emacs/site-lisp/auctex/font-latex
/home/qsx/.emacs.d/elpa/auctex-12.2.5/auctex hides /usr/share/emacs/site-lisp/auctex/auctex
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-bar hides /usr/share/emacs/site-lisp/auctex/tex-bar
/home/qsx/.emacs.d/elpa/auctex-12.2.5/latex hides /usr/share/emacs/site-lisp/auctex/latex
/home/qsx/.emacs.d/elpa/auctex-12.2.5/plain-tex hides /usr/share/emacs/site-lisp/auctex/plain-tex
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-style hides /usr/share/emacs/site-lisp/auctex/tex-style
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-buf hides /usr/share/emacs/site-lisp/auctex/tex-buf
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-jp hides /usr/share/emacs/site-lisp/auctex/tex-jp
/home/qsx/.emacs.d/elpa/auctex-12.2.5/tex-site hides /usr/share/emacs/site-lisp/auctex/tex-site
/home/qsx/.emacs.d/elpa/auctex-12.2.5/bib-cite hides /usr/share/emacs/site-lisp/auctex/bib-cite
/usr/share/emacs/site-lisp/cmake-mode hides /usr/share/emacs/site-lisp/cmake/cmake-mode
/home/qsx/.emacs.d/elpa/dash-functional-20200617.702/dash-functional hides /usr/share/emacs/site-lisp/dash/dash-functional
/home/qsx/.emacs.d/elpa/dash-20200803.1520/dash hides /usr/share/emacs/site-lisp/dash/dash
/usr/share/emacs/site-lisp/desktop-entry-mode hides /usr/share/emacs/site-lisp/desktop-file-utils/desktop-entry-mode
/home/qsx/.emacs.d/elpa/f-20191110.1357/f hides /usr/share/emacs/site-lisp/f/f
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-post hides /usr/share/emacs/site-lisp/ledger-mode/ledger-post
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-navigate hides /usr/share/emacs/site-lisp/ledger-mode/ledger-navigate
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-reconcile hides /usr/share/emacs/site-lisp/ledger-mode/ledger-reconcile
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-commodities hides /usr/share/emacs/site-lisp/ledger-mode/ledger-commodities
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-texi hides /usr/share/emacs/site-lisp/ledger-mode/ledger-texi
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-init hides /usr/share/emacs/site-lisp/ledger-mode/ledger-init
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-fontify hides /usr/share/emacs/site-lisp/ledger-mode/ledger-fontify
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-report hides /usr/share/emacs/site-lisp/ledger-mode/ledger-report
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-sort hides /usr/share/emacs/site-lisp/ledger-mode/ledger-sort
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-exec hides /usr/share/emacs/site-lisp/ledger-mode/ledger-exec
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-mode hides /usr/share/emacs/site-lisp/ledger-mode/ledger-mode
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-check hides /usr/share/emacs/site-lisp/ledger-mode/ledger-check
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-test hides /usr/share/emacs/site-lisp/ledger-mode/ledger-test
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-occur hides /usr/share/emacs/site-lisp/ledger-mode/ledger-occur
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-xact hides /usr/share/emacs/site-lisp/ledger-mode/ledger-xact
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-regex hides /usr/share/emacs/site-lisp/ledger-mode/ledger-regex
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-context hides /usr/share/emacs/site-lisp/ledger-mode/ledger-context
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-state hides /usr/share/emacs/site-lisp/ledger-mode/ledger-state
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-complete hides /usr/share/emacs/site-lisp/ledger-mode/ledger-complete
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-schedule hides /usr/share/emacs/site-lisp/ledger-mode/ledger-schedule
/home/qsx/.emacs.d/elpa/ledger-mode-20200530.1710/ledger-fonts hides /usr/share/emacs/site-lisp/ledger-mode/ledger-fonts
/home/qsx/.emacs.d/elpa/s-20180406.808/s hides /usr/share/emacs/site-lisp/s/s
/home/qsx/.emacs.d/elpa/with-editor-20200930.1912/with-editor hides /usr/share/emacs/site-lisp/with-editor/with-editor
/usr/share/emacs/site-lisp/mercury/gud hides /usr/share/emacs/27.1/lisp/progmodes/gud
/usr/share/emacs/site-lisp/mercurial/mercurial hides /home/qsx/.emacs.d/elisp/mercurial

Features:
(shadow nnir emacsbug sendmail helm-x-files helm-for-files helm-bookmark
helm-adaptive helm-external helm-net mule-util flow-fill ace-window avy
eieio-opt rfc1843 help-fns radix-tree gnus-cite smiley mm-archive
mail-extr gnus-bcklg qp helm-config gnus-async sort gnus-ml disp-table
hl-line cursor-sensor nnagent nnml nndraft nnmh nnfolder nnmaildir nnnil
gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art
mm-uu mml2015 mm-view mml-smime smime dig nntp gnus-cache gnus-sum url
url-proxy url-privacy url-expand url-methods url-history mailcap shr
url-cookie url-domsuf svg dom gnus-group gnus-undo gnus-start gnus-cloud
nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int gnus-range
gnus-win winner helm-command helm-elisp helm-eval edebug backtrace
helm-info gnus-alias gnus nnheader calc calc-loaddefs calc-macs
ledger-mode ledger-check ledger-texi ledger-test ledger-sort
ledger-report ledger-reconcile ledger-occur ledger-fonts ledger-fontify
ledger-state ledger-complete ledger-schedule ledger-init ledger-xact
ledger-post ledger-exec ledger-navigate eshell esh-cmd esh-ext esh-opt
esh-proc esh-io esh-arg esh-module esh-groups esh-util ledger-context
ledger-commodities org org-macro org-footnote org-pcomplete org-list
org-faces org-entities org-version ob-sqlite ob ob-tangle org-src ob-ref
ob-lob ob-table ob-exp ob-comint ob-emacs-lisp ob-core ob-eval org-table
ol org-keys org-compat org-macs org-loaddefs cal-menu calendar
cal-loaddefs ledger-regex haskell-mode haskell-cabal haskell-utils
haskell-font-lock haskell-indentation haskell-string
haskell-sort-imports haskell-lexeme haskell-align-imports
haskell-complete-module haskell-ghc-support etags fileloop generator
dabbrev haskell-customize adoc-mode tempo markup-faces json-mode
json-reformat json-snatcher js toml-mode conf-mode align auctex-latexmk
tex-buf latex latex-flymake flymake-proc flymake tex-ispell tex-style
tex dbus xml texmathp pdf-occur ibuf-ext ibuffer ibuffer-loaddefs
tablist tablist-filter semantic/wisent/comp semantic/wisent
semantic/wisent/wisent pdf-isearch let-alist pdf-misc pdf-tools cus-edit
pdf-view magit-bookmark bookmark jka-compr pdf-cache pdf-info tq
pdf-util image-mode exif dockerfile-mode sh-script executable
poly-ansible poly-ansible-jinja2-filters polymode poly-lock
polymode-base polymode-weave polymode-export polymode-compat
polymode-methods polymode-core polymode-classes ansible salt-mode rst
mmm-jinja2 mmm-auto mmm-vars mmm-utils mmm-compat yaml-mode
rainbow-delimiters stripe-buffer meson-mode smie apache-mode form-feed
helm-rg helm-mode helm-files tramp tramp-loaddefs trampver
tramp-integration files-x tramp-compat parse-time iso8601 ls-lisp
helm-buffers helm-occur helm-tags helm-locate helm-grep helm-regexp
helm-utils helm-help helm-types helm helm-global-bindings helm-easymenu
helm-source eieio-compat helm-multi-match helm-lib magit-submodule
magit-obsolete magit-blame magit-stash magit-reflog magit-bisect
magit-push magit-pull magit-fetch magit-clone magit-remote magit-commit
magit-sequence magit-notes magit-worktree magit-tag magit-merge
magit-branch magit-reset magit-files magit-refs magit-status magit
magit-repos magit-apply magit-wip magit-log which-func magit-diff
smerge-mode diff diff-mode magit-core magit-autorevert autorevert
magit-margin magit-transient magit-process magit-mode git-commit
transient magit-git magit-section magit-utils crm log-edit message dired
dired-loaddefs format-spec rfc822 mml mml-sec epa derived epg epg-config
gnus-util rmail rmail-loaddefs text-property-search time-date mm-decode
mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util
ietf-drums mail-prsvr mailabbrev mail-utils gmm-utils mailheader
pcvs-util add-log with-editor async-bytecomp advice async shell
pcomplete lsp-ui lsp-ui-doc goto-addr lsp-ui-imenu lsp-ui-peek
lsp-ui-sideline face-remap lsp-mode yasnippet xref project url-util
spinner network-stream puny nsm rmc markdown-mode color thingatpt
noutline outline lv inline imenu filenotify f ewoc dash-functional
compile comint ansi-color bindat lsp-protocol ht srefactor srefactor-ui
recentf tree-widget cl srecode/semantic semantic/senator
semantic/decorate pulse srecode/insert srecode/filters srecode/args
ede/speedbar ede/files ede ede/detect ede/base ede/auto ede/source
eieio-speedbar speedbar sb-image dframe eieio-custom wid-edit
srecode/find srecode/map srecode/ctxt srecode/compile srecode/dictionary
srecode/fields srecode/table srecode semantic/doc semantic/tag-file
quilt semantic/db-file data-debug ring cedet-files semantic/bovine/c
hideif semantic/bovine/c-by semantic/lex-spp semantic/bovine/gcc
semantic/dep semantic/bovine semantic/analyze/refs semantic/db-find
semantic/db-ref cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles
cc-align cc-engine cc-vars cc-defs semantic/db-mode semantic/idle
semantic/analyze semantic/sort semantic/scope semantic/analyze/fcn
semantic/db eieio-base semantic/format ezimage semantic/tag-ls
semantic/find semantic/ctxt semantic/util-modes semantic/util semantic
pp semantic/tag semantic/lex semantic/fw mode-local find-func cedet
company-shell dash company-ansible company-ansible-keywords
company-reftex s reftex-cite reftex reftex-loaddefs reftex-vars
company-bibtex parsebib bibtex warnings company-math math-symbol-lists
company edmacro kmacro pcase cl-extra help-mode battery time
deeper-blue-theme paren cus-start cus-load server use-package
use-package-ensure use-package-delight use-package-diminish
use-package-bind-key bind-key easy-mmode use-package-core finder-inf
site-gentoo w3m-load preview-latex erlang-start tex-site dpkg-dev-el
debian-el rx info package easymenu browse-url url-handlers url-parse
auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache json subr-x map url-vars seq byte-opt gv bytecomp
byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads dbusbind inotify lcms2 dynamic-setting
system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit
x multi-tty make-network-process emacs)

Memory information:
((conses 16 3763758 413383)
 (symbols 48 66383 3)
 (strings 32 2065449 238006)
 (string-bytes 1 108958761)
 (vectors 16 707848)
 (vector-slots 8 10736219 416300)
 (floats 8 667 485)
 (intervals 56 8152 1322)
 (buffers 1000 47))




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Fri, 30 Oct 2020 13:11:01 GMT) Full text and rfc822 format available.

Message #8 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Thomas Schneider <qsx <at> chaotikum.eu>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Fri, 30 Oct 2020 14:10:16 +0100
Thomas Schneider <qsx <at> chaotikum.eu> writes:

> A working example can be found below – uuencoded to prevent any further
> charset transfer issues.  It should display 'ääää', that is, four 'a'
> with a trema above, in either part.  I’m unsure how to reproduce it
> without a working Gnus setup.  I inserted the message in a nnmaildir
> group to display it, so this way should definitely work.
>
> From: example <example <at> example.org>
> Subject: test
> To: example <example <at> example.org>
> Date: Thu, 29 Oct 2020 14:47:55 +0100 (23 hours, 21 minutes, 37 seconds ago)
>
> dddd
> ----------

Something has badly mangled the message in transport.  Can you gzip the
file and include it as an attachment instead?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Fri, 30 Oct 2020 13:27:02 GMT) Full text and rfc822 format available.

Message #11 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org, Thomas Schneider <qsx <at> chaotikum.eu>
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Fri, 30 Oct 2020 14:26:54 +0100
On Okt 30 2020, Lars Ingebrigtsen wrote:

> Thomas Schneider <qsx <at> chaotikum.eu> writes:
>
>> A working example can be found below – uuencoded to prevent any further
>> charset transfer issues.  It should display 'ääää', that is, four 'a'
>> with a trema above, in either part.  I’m unsure how to reproduce it
>> without a working Gnus setup.  I inserted the message in a nnmaildir
>> group to display it, so this way should definitely work.
>>
>> From: example <example <at> example.org>
>> Subject: test
>> To: example <example <at> example.org>
>> Date: Thu, 29 Oct 2020 14:47:55 +0100 (23 hours, 21 minutes, 37 seconds ago)
>>
>> dddd
>> ----------
>
> Something has badly mangled the message in transport.  Can you gzip the
> file and include it as an attachment instead?

Try setting mm-dissect-disposition to "attachment".

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Fri, 30 Oct 2020 13:29:01 GMT) Full text and rfc822 format available.

Message #14 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Thomas Schneider <qsx <at> chaotikum.eu>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Fri, 30 Oct 2020 14:28:24 +0100
On Okt 29 2020, Thomas Schneider wrote:

> Any non-ASCII printable character (e. g., German umlauts) are
> interpreted as something else (probably ISO 8859) in the HTML part, or
> not interpreted at all and displayed as octal escapes in the plain part.

Looks like double decoding.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Fri, 30 Oct 2020 13:37:02 GMT) Full text and rfc822 format available.

Message #17 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 44307 <at> debbugs.gnu.org, Thomas Schneider <qsx <at> chaotikum.eu>
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Fri, 30 Oct 2020 14:35:53 +0100
Andreas Schwab <schwab <at> linux-m68k.org> writes:

> Try setting mm-dissect-disposition to "attachment".

The message (containing uuencoded data) is mangled in any case.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Fri, 30 Oct 2020 14:54:01 GMT) Full text and rfc822 format available.

Message #20 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org, Thomas Schneider <qsx <at> chaotikum.eu>
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Fri, 30 Oct 2020 15:53:33 +0100
On Okt 30 2020, Lars Ingebrigtsen wrote:

> Andreas Schwab <schwab <at> linux-m68k.org> writes:
>
>> Try setting mm-dissect-disposition to "attachment".
>
> The message (containing uuencoded data) is mangled in any case.

Even if you add '(uu . disabled) to mm-uu-configure-list?

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 01 Nov 2020 12:12:01 GMT) Full text and rfc822 format available.

Message #23 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 44307 <at> debbugs.gnu.org, Thomas Schneider <qsx <at> chaotikum.eu>
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 01 Nov 2020 13:10:59 +0100
Andreas Schwab <schwab <at> linux-m68k.org> writes:

>> The message (containing uuencoded data) is mangled in any case.
>
> Even if you add '(uu . disabled) to mm-uu-configure-list?

Even if I look at the raw message as posted to the bugs mailing list --
the uuencoded data doesn't seem right when I try to decode it.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 01 Nov 2020 12:16:01 GMT) Full text and rfc822 format available.

Message #26 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org, Thomas Schneider <qsx <at> chaotikum.eu>
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 01 Nov 2020 13:15:50 +0100
On Nov 01 2020, Lars Ingebrigtsen wrote:

> Andreas Schwab <schwab <at> linux-m68k.org> writes:
>
>>> The message (containing uuencoded data) is mangled in any case.
>>
>> Even if you add '(uu . disabled) to mm-uu-configure-list?
>
> Even if I look at the raw message as posted to the bugs mailing list --
> the uuencoded data doesn't seem right when I try to decode it.

Did you remove the QP layer?

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Mon, 02 Nov 2020 14:58:01 GMT) Full text and rfc822 format available.

Message #29 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 44307 <at> debbugs.gnu.org, Thomas Schneider <qsx <at> chaotikum.eu>
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Mon, 02 Nov 2020 15:56:51 +0100
Andreas Schwab <schwab <at> linux-m68k.org> writes:

>> Even if I look at the raw message as posted to the bugs mailing list --
>> the uuencoded data doesn't seem right when I try to decode it.
>
> Did you remove the QP layer?

Nope; doing that I'm able to uudecode the mail.

I then tried displaying it as an nndoc group, and that seems to work
fine for me in Emacs 28.

Thomas, could you check whether this works in Emacs 28, or give more
precise instructions for how to reproduce the bug?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sat, 02 Jan 2021 20:27:02 GMT) Full text and rfc822 format available.

Message #32 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sat, 02 Jan 2021 21:26:30 +0100
Hi,

I'm back to using Gnus after a 15-year hiatus, and this is the first
issue I've noticed in several emails.

I am able to reproduce it with "emacs -Q" (version 27.1 from Debian) as
follows:

1. manually copy the uuencoded mail from
   https://debbugs.gnu.org/cgi/bugreport.cgi?bug=44307#5
   in example.uu
2. sed -i 's/ <at> /@/g' example.uu
3. uudecode example.uu
4. emacs -Q
5. M-x gnus        (ignoring any error)
6. G f example.eml
7. press RET on nndoc+/path/to/example.eml:example.eml
8. press RET on top-level message "<* alternative> text"

Doing so renders the html part by default, but displays "dddd" instead
of "ääää".

Clicking inside this message on the "Attachement: [2. text/plain]"
button inserts "\344\344\344\344".   I.e., that's
the Latin-1 version of "ääää".  (M-x describe-char on these say that they
are "not encodable by coding system utf-8-unix")

Typing "C latin-1" on the "[2. text/plain]" button 
displays the characters correctly.

Typing "C-u g" to display the raw article shows the utf-8 encoded
characters as \303\244\303\244\303\244\303\244.

So my understanding is that the mime parts, which are utf-8 encoded,
get somehow converted to latin-1 before being displayed as utf-8.

-- 
Alexandre Duret-Lutz




Merged 44307 45657. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 04 Jan 2021 19:23:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Mon, 04 Jan 2021 21:55:02 GMT) Full text and rfc822 format available.

Message #37 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: 44307 <at> debbugs.gnu.org
Cc: larsi <at> gnus.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Mon, 04 Jan 2021 22:54:18 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:
> Clicking inside this message on the "Attachement: [2. text/plain]"
> button inserts "\344\344\344\344".   I.e., that's
> the Latin-1 version of "ääää".  (M-x describe-char on these say that they
> are "not encodable by coding system utf-8-unix")

Digging the code, I believe that the unexpected conversion occurs in this macro:

(defmacro mm-with-part (handle &rest forms)
  "Run FORMS in the temp buffer containing the contents of HANDLE."
  ;; The handle-buffer's content is a sequence of bytes, not a sequence of
  ;; chars, so the buffer should be unibyte.  It may happen that the
  ;; handle-buffer is multibyte for some reason, in which case now is a good
  ;; time to adjust it, since we know at this point that it should
  ;; be unibyte.
  `(let* ((handle ,handle))
     (when (and (mm-handle-buffer handle)
		(buffer-name (mm-handle-buffer handle)))
       (with-temp-buffer
	 (mm-disable-multibyte)
	 (insert-buffer-substring (mm-handle-buffer handle))
	 (mm-decode-content-transfer-encoding
	  (mm-handle-encoding handle)
	  (mm-handle-media-type handle))
	 ,@forms))))


In my case the (mm-handle-buffer handle) is multibyte.  This
multibyteness was preserved by mm-copy-to-buffer while creating the
handle buffer, but a did not check the original source of it, since the
comment above the macro suggests that having multibyte parts is OK.

However the 

	 (mm-disable-multibyte)
	 (insert-buffer-substring (mm-handle-buffer handle))

seems to be doing harm.  The documentation of
insert-buffer-substring/insert notes that multibyte strings will be
converted by taking the lowest 8 bits of each multibyte character, not
by spliting those characters.

Mimicking it with

(let ((utf8string "ääää")) ; typed as utf8
  (with-temp-buffer
    (mm-disable-multibyte)
    (insert utf8string)
    (print (string-bytes utf8string))
    (print (string-bytes (buffer-string)))
    (buffer-string)))

this prints :

8
4
"\344\344\344\344"


So it would seem that (mm-disable-multibyte) should be called *after* the
insertion and not before, in order to perserve all bytes.

Does this make sense?

-- 
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Tue, 05 Jan 2021 09:31:01 GMT) Full text and rfc822 format available.

Message #40 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Tue, 05 Jan 2021 10:30:33 +0100
[Message part 1 (text/plain, inline)]
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> I am able to reproduce it with "emacs -Q" (version 27.1 from Debian) as
> follows:
>
> 1. manually copy the uuencoded mail from
>    https://debbugs.gnu.org/cgi/bugreport.cgi?bug=44307#5
>    in example.uu
> 2. sed -i 's/ <at> /@/g' example.uu
> 3. uudecode example.uu

This leaves me with a file that looks like:

[Message part 2 (image/png, inline)]
[Message part 3 (text/plain, inline)]
Which isn't valid.  Can you zip up the file and send it as an
attachment?


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Tue, 05 Jan 2021 10:02:02 GMT) Full text and rfc822 format available.

Message #43 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: 44307 <at> debbugs.gnu.org
Cc: larsi <at> gnus.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Tue, 05 Jan 2021 11:00:58 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:
> In my case the (mm-handle-buffer handle) is multibyte.  This
> multibyteness was preserved by mm-copy-to-buffer while creating the
> handle buffer, but a did not check the original source of it, since the
> comment above the macro suggests that having multibyte parts is OK.

I think I understand why the bug cannot be reproduced using nndoc in
emacs 28.  The following patch, which fixes the unibyteness of the nndoc
buffer is not part of emacs 27:

commit 9d0385d7c7adc810dfd06321b783593b7afb3d58
Author: Lars Ingebrigtsen <larsi <at> gnus.org>
Date:   Fri Aug 21 15:36:45 2020 +0200

    Fix problem with 8bit content-transfer-encoding in nndoc mbox files
    
    * lisp/gnus/nndoc.el (nndoc-possibly-change-buffer): If we're
    reading an mbox file, it may contain messages that use
    content-transfer-encoding 8bit, which means that we have to treat
    the file as a sequence of byte (bug#42951).  This avoids
    double-decoding -- once by Emacs when inserting the mbox into the
    buffer, and once by Gnus when displaying the articles.


So when mm-with-part process the MIME parts of the problematic message
read from the nndoc group, it receives a multibyte buffer in Emacs 27
but a unibyte buffer in Emacs 28. 

However the issue is not restricted to nndoc.

With both emacs 27 & 28 I'm still having the original issue with mail
read using nnmaildir.  So maybe nnmaildir deserves a similar fix?
I'm not quite sure where that would go.  nnmaildir-request-article?

In any way I'm currently running Gnus with the following change:

--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -1264,8 +1264,8 @@ mm-with-part
      (when (and (mm-handle-buffer handle)
                (buffer-name (mm-handle-buffer handle)))
        (with-temp-buffer
-        (mm-disable-multibyte)
         (insert-buffer-substring (mm-handle-buffer handle))
+        (mm-disable-multibyte)
         (mm-decode-content-transfer-encoding
          (mm-handle-encoding handle)
          (mm-handle-media-type handle))

and that seems to solve my problem in both version of Emacs, and in both
nndoc or nnmaildir.

-- 
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Tue, 05 Jan 2021 10:09:01 GMT) Full text and rfc822 format available.

Message #46 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Tue, 05 Jan 2021 11:07:52 +0100
[Message part 1 (text/plain, inline)]
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> Which isn't valid.  Can you zip up the file and send it as an
> attachment?

Here is the file.  But see by previous message from today.  Reading this
with nndoc will appear bogus with emacs 27.1, but not with emacs 28.
However if there is a way to read that mail with nnmaildir, you should
see the issue with both versions.

-- 
Alexandre Duret-Lutz

[example.zip (application/zip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Tue, 05 Jan 2021 10:16:02 GMT) Full text and rfc822 format available.

Message #49 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Tue, 05 Jan 2021 11:14:51 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> Here is the file.  But see by previous message from today.  Reading this
> with nndoc will appear bogus with emacs 27.1, but not with emacs 28.
> However if there is a way to read that mail with nnmaildir, you should
> see the issue with both versions.

Thanks for the test file (and the analysis of the differences between
27.1 and 28 here).  I'll try to work on fix for this in Emacs 27
tomorrow -- I'm not sure the simple fix you outlined won't have adverse
effects if the part in question is binary (i.e., an image or the like).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Tue, 05 Jan 2021 11:18:02 GMT) Full text and rfc822 format available.

Message #52 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Tue, 05 Jan 2021 12:17:38 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:
>
>> Here is the file.  But see by previous message from today.  Reading this
>> with nndoc will appear bogus with emacs 27.1, but not with emacs 28.
>> However if there is a way to read that mail with nnmaildir, you should
>> see the issue with both versions.
>
> Thanks for the test file (and the analysis of the differences between
> 27.1 and 28 here).  I'll try to work on fix for this in Emacs 27
> tomorrow -- I'm not sure the simple fix you outlined won't have adverse
> effects if the part in question is binary (i.e., an image or the like).

Sorry, you can ignore my simple "fix": I have now found a case that it
breaks.

I was testing mainly on utf-8 messages.  But I've now looked (through
nnmaildir) at a mail containing a part encoded as follow:

Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit

However the rendering was garbled.  Looking ag the output of "C-u g" I
can see that the buffer contains utf-8 code (I'm guessing because that
the windows-1252 representation was decoded when nnmaildir read the mail
from file), and that will make no sense once Gnus attempts to decode
that in windows-1252.

Without my incorrect patch, displaying such an email would appear to
work (by luck?), because when mm-with-part inserts the multibyte buffer
into the unibyte buffer, the utf-8 characters are somehow converted to
their windows-1252 equivalent encoding.

I guess the real issue is that mmaildir does not correctly deal with
multibyte files et may return multibyte buffer, just like nndoc used to.
-- 
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 07 Jan 2021 13:44:02 GMT) Full text and rfc822 format available.

Message #55 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Thu, 07 Jan 2021 14:43:33 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> I think I understand why the bug cannot be reproduced using nndoc in
> emacs 28.  The following patch, which fixes the unibyteness of the nndoc
> buffer is not part of emacs 27:
>
> commit 9d0385d7c7adc810dfd06321b783593b7afb3d58
> Author: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date:   Fri Aug 21 15:36:45 2020 +0200

I've now backported this fix to Emacs 27.

> However the issue is not restricted to nndoc.
>
> With both emacs 27 & 28 I'm still having the original issue with mail
> read using nnmaildir.  So maybe nnmaildir deserves a similar fix?
> I'm not quite sure where that would go.  nnmaildir-request-article?

Me neither -- I'm wholly unfamiliar with nnmaildir, unfortunately, and
it seems like it's just using `nnheader-insert-file-contents' here (into
the " *nntpd* buffer), so that's not the correct fix..

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 07 Jan 2021 14:15:01 GMT) Full text and rfc822 format available.

Message #58 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Thu, 07 Jan 2021 15:14:01 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> Here is the file.  But see by previous message from today.  Reading this
> with nndoc will appear bogus with emacs 27.1, but not with emacs 28.
> However if there is a way to read that mail with nnmaildir, you should
> see the issue with both versions.

I've now committed a fix to mm-with-part that may or may not fix this
nnmaildir problem.  Can you try this (in Emacs 28)?  You may have to do
a "make bootstrap" or at least remove all the lisp/gnus/*.elc files for
the change to have any effect.

If it works for you in Emacs 28, we may backport it to Emacs 27.  I
think it should be safe (famous last words).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 07 Jan 2021 16:07:01 GMT) Full text and rfc822 format available.

Message #61 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Thu, 07 Jan 2021 17:06:44 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> I've now committed a fix to mm-with-part that may or may not fix this
> nnmaildir problem.  

Question: shouldn't mm-with-part always leave the buffer in unibyte
mode?  The comment at the beginning of the macro seems to suggest that,
but the new "if" does not call (mm-disable-multibyte) after inserting
the part.

Otherwise that would be just pushing the issue further away, to the next
place where when the contents of mm-with-part will be inserted in a
unibyte buffer.

> Can you try this (in Emacs 28)?  You may have to do a "make bootstrap"
> or at least remove all the lisp/gnus/*.elc files for the change to
> have any effect.

After "make bootstrap", this seems to fix only the rendering of
text/html utf-8 parts (I'm using w3m, if that matters).  However
text/plain utf-8 parts are still garbled as they where before.

If I tweak the patch a follows:

--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -1271,7 +1271,9 @@ mm-with-part
             ;; multibyte buffer here, but if it's using an 8bit
             ;; Content-Transfer-Encoding, then work around that by
             ;; just ignoring the situation.
-            (insert-buffer-substring (mm-handle-buffer handle))
+            (progn
+              (insert-buffer-substring (mm-handle-buffer handle))
+              (mm-disable-multibyte))
           ;; Do the decoding.
           (mm-disable-multibyte)
           (insert-buffer-substring (mm-handle-buffer handle))

this seems to fix text/plain utf-8 parts as well, however the
rendering of window-1252 parts is now broken...

See the following table, where "with patch" refers to
commit (23a887e4), and "disable-mb" to the above tweak.

|-------------+------------+---------------+------------+------------|
| charset     | type       | without patch | with patch | disable-mb |
|-------------+------------+---------------+------------+------------|
| utf-8       | text/html  | garbled       | ok         | ok         |
| window-1252 | test/html  | ok            | ok         | garbled    |
| utf-8       | text/plain | garbled       | garbled    | ok         |
| window-1252 | test/plain | ok            | ok         | garbled    |

When looking at window-1252-encoded mails read by nnmaildir, and
rendered using "C-u g" (where none of the above changes should matter),
it's obvious that the buffer contains utf-8 characters.

My guess is that when nnmaildir calls nnheader-insert-file-contents to
reads the mail, it does so with 'undecided coding.  emacs then
automatically detect window-1252 and converts it to utf-8 for its
internal representation.
-- 
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 07 Jan 2021 16:11:01 GMT) Full text and rfc822 format available.

Message #64 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Thu, 07 Jan 2021 17:10:22 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> Question: shouldn't mm-with-part always leave the buffer in unibyte
> mode?  The comment at the beginning of the macro seems to suggest that,
> but the new "if" does not call (mm-disable-multibyte) after inserting
> the part.

Hm, true.

> -            (insert-buffer-substring (mm-handle-buffer handle))
> +            (progn
> +              (insert-buffer-substring (mm-handle-buffer handle))
> +              (mm-disable-multibyte))

No, disabling multibyte in a non-empty buffer is always the wrong thing
to do -- using encode-coding-region is probably the thing to do here.

>            ;; Do the decoding.
>            (mm-disable-multibyte)
>            (insert-buffer-substring (mm-handle-buffer handle))
>
> this seems to fix text/plain utf-8 parts as well, however the
> rendering of window-1252 parts is now broken...

Yeah, that's to be expected.

Could you forward a message with a windows-1252 part, too (like the
previous one), and I can add it to the test cases.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 07 Jan 2021 17:08:02 GMT) Full text and rfc822 format available.

Message #67 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Thu, 07 Jan 2021 18:07:46 +0100
[Message part 1 (text/plain, inline)]
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> Could you forward a message with a windows-1252 part, too (like the
> previous one), and I can add it to the test cases.

Here you are.  (I had to reencode it by hand, because typing C-o on the
original nnmaildir article would produce an utf-8 encoded file...)

To be clear: I have no issue viewing this with nndoc in any version of
emacs.  Only with the nnmaildir backend.

-- 
Alexandre Duret-Lutz
[win1252.zip (application/zip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 10 Jan 2021 12:28:01 GMT) Full text and rfc822 format available.

Message #70 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 10 Jan 2021 13:27:23 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> After "make bootstrap", this seems to fix only the rendering of
> text/html utf-8 parts (I'm using w3m, if that matters).  However
> text/plain utf-8 parts are still garbled as they where before.

Yeah, it was a long shot.  What about the following patch?

diff --git a/lisp/gnus/nnmaildir.el b/lisp/gnus/nnmaildir.el
index e4fd976742..59926991b3 100644
--- a/lisp/gnus/nnmaildir.el
+++ b/lisp/gnus/nnmaildir.el
@@ -1351,7 +1351,8 @@ nnmaildir-request-article
 	(throw 'return nil))
       (with-current-buffer (or to-buffer nntp-server-buffer)
 	(erase-buffer)
-	(nnheader-insert-file-contents nnmaildir-article-file-name))
+	(let ((nnheader-file-coding-system nnmail-file-coding-system))
+	  (nnheader-insert-file-contents nnmaildir-article-file-name)))
       (cons gname num-msgid))))
 
 (defun nnmaildir-request-post (&optional _server)


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 10 Jan 2021 14:03:02 GMT) Full text and rfc822 format available.

Message #73 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 10 Jan 2021 15:02:27 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> What about the following patch?
>
> @@ -1351,7 +1351,8 @@ nnmaildir-request-article
> -	(nnheader-insert-file-contents nnmaildir-article-file-name))
> +	(let ((nnheader-file-coding-system nnmail-file-coding-system))
> +	  (nnheader-insert-file-contents nnmaildir-article-file-name)))

I was playing with something similar this morning:

@@ -1351,7 +1351,9 @@ nnmaildir-request-article
	(throw 'return nil))
       (with-current-buffer (or to-buffer nntp-server-buffer)
	(erase-buffer)
-	(nnheader-insert-file-contents nnmaildir-article-file-name))
+       (mm-disable-multibyte)
+	(let ((coding-system-for-read mm-text-coding-system))
+	  (nnheader-insert-file-contents nnmaildir-article-file-name)))
       (cons gname num-msgid))))

mm-text-coding-system and nnmail-file-coding-system both default
to 'raw-text.

Without (mm-disable-multibyte), the patch makes no difference to me.

The documentation for 'raw-text on
https://www.gnu.org/software/emacs/manual/html_node/emacs/Coding-Systems.html
states that 'raw-text causes enable-multibyte-characters to be set to
nil, but it's not clear when this should occur, and printing
enable-multibyte-characters after the call to
nnheader-insert-file-contents still shows t.

Adding (mm-disable-multibyte) to the patch seems help a lot, although
the first impression is much worse:

1. When a mail is first displayed (using RET or g), the article buffer
   is unibyte with all non-ascii characters displayed as backslash
   sequences.  This occurs for all mails, even QP-encoded ones.

2. When a mail is displayed for the second time (using g on the same
   article or RET to change article and come back), the display is
   *perfect*.  I.e., plain/text and plain/html parts that are encoded
   with either utf-8 or windows-1252 are correctly displayed for me.

3. Running M-x gnus-backlog-shutdown gets me back to 1. where
   all non-ascii characters are displayed as backslashes.


PS: all of this is with an updated emacs 28, including the reverted
mm-with-part change.

--
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 10 Jan 2021 14:12:02 GMT) Full text and rfc822 format available.

Message #76 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 10 Jan 2021 15:11:13 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> Without (mm-disable-multibyte), the patch makes no difference to me.

Darn.  The multibyteness here isn't what you should be looking at,
though -- the " *nntpd*" buffer is multibyte, so it's all gonna end up
in that state, anyway.  We just have to trick Emacs into not
interpreting the files as text, but as bytes, which is what nnml's
request-article function does, for instance.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 10 Jan 2021 14:49:01 GMT) Full text and rfc822 format available.

Message #79 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 10 Jan 2021 15:48:35 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:
>
>> Without (mm-disable-multibyte), the patch makes no difference to me.
>
> Darn.  The multibyteness here isn't what you should be looking at,
> though -- the " *nntpd*" buffer is multibyte, so it's all gonna end up
> in that state, anyway.  We just have to trick Emacs into not
> interpreting the files as text, but as bytes, which is what nnml's
> request-article function does, for instance.

The following patch seems to fix my rendering issues.

I don't understand why nnheader-insert-file-contents with 'raw-text
coding does not seem to work as desired on a multibyte-buffer.  I'll try
to play with it more.

This patch has the potential side effect of leaving to-buffer as
multibyte even if it was initially unibyte.  I don't know if this could
be an issue.  You seem to suggest those buffer are meant to be multibyte
anyway.

You've mentioned the " *nntpd*" buffer, but in make case this code seems
to be always using to-buffer ("*Article nnmaildir+gmail:Inbox*").  Not
sure if this matters.


diff --git a/lisp/gnus/nnmaildir.el b/lisp/gnus/nnmaildir.el
index e4fd976742..ca2b0e1295 100644
--- a/lisp/gnus/nnmaildir.el
+++ b/lisp/gnus/nnmaildir.el
@@ -1351,7 +1351,10 @@ nnmaildir-request-article
 	(throw 'return nil))
       (with-current-buffer (or to-buffer nntp-server-buffer)
 	(erase-buffer)
-	(nnheader-insert-file-contents nnmaildir-article-file-name))
+	(mm-disable-multibyte)
+	(let ((coding-system-for-read mm-text-coding-system))
+	  (nnheader-insert-file-contents nnmaildir-article-file-name))
+	(mm-enable-multibyte))
       (cons gname num-msgid))))
 
 (defun nnmaildir-request-post (&optional _server)


-- 
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Sun, 10 Jan 2021 15:22:02 GMT) Full text and rfc822 format available.

Message #82 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Sun, 10 Jan 2021 16:21:08 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> I don't understand why nnheader-insert-file-contents with 'raw-text
> coding does not seem to work as desired on a multibyte-buffer.

OK, that was silly: nnheader-insert-file-contents resets
coding-system-for-read to 'undecided.

The following also seems to fix the rendering of all my mails.

diff --git a/lisp/gnus/nnmaildir.el b/lisp/gnus/nnmaildir.el
index e4fd976742..2a4c74db5e 100644
--- a/lisp/gnus/nnmaildir.el
+++ b/lisp/gnus/nnmaildir.el
@@ -1351,7 +1351,8 @@ nnmaildir-request-article
 	(throw 'return nil))
       (with-current-buffer (or to-buffer nntp-server-buffer)
 	(erase-buffer)
-	(nnheader-insert-file-contents nnmaildir-article-file-name))
+	(let ((coding-system-for-read mm-text-coding-system))
+	  (mm-insert-file-contents nnmaildir-article-file-name)))
       (cons gname num-msgid))))
 
 (defun nnmaildir-request-post (&optional _server)



-- 
Alexandre Duret-Lutz




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Mon, 11 Jan 2021 14:30:02 GMT) Full text and rfc822 format available.

Message #85 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Mon, 11 Jan 2021 15:28:45 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> OK, that was silly: nnheader-insert-file-contents resets
> coding-system-for-read to 'undecided.
>
> The following also seems to fix the rendering of all my mails.

Thanks; applied to Emacs 28.  If nobody reports any problems, we'll
backport to Emacs 27.2.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Tue, 02 Feb 2021 11:37:01 GMT) Full text and rfc822 format available.

Message #88 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Tue, 02 Feb 2021 12:36:48 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> Thanks; applied to Emacs 28.  If nobody reports any problems, we'll
> backport to Emacs 27.2.

Hi Lars,

I don't know what the timeline for 27.2 is, but I just saw
the announcement for 27.1.91.

Any chance to backport this patch? (6129ebf4)

Thanks!




Information forwarded to bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org:
bug#44307; Package emacs,gnus. (Thu, 04 Feb 2021 08:05:01 GMT) Full text and rfc822 format available.

Message #91 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
Cc: 44307 <at> debbugs.gnu.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Thu, 04 Feb 2021 09:04:39 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:

> I don't know what the timeline for 27.2 is, but I just saw
> the announcement for 27.1.91.
>
> Any chance to backport this patch? (6129ebf4)

Yup; now backported (as there's been no reports about it being
problematic on the trunk).

I'm not sure whether it'll make it into 27.2, though (i.e., if there's
going to be more pretests).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) fixed. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 04 Feb 2021 08:06:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 28.1, send any further explanations to 44307 <at> debbugs.gnu.org and Thomas Schneider <qsx <at> chaotikum.eu> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 04 Feb 2021 08:06:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 04 Mar 2021 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 53 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.