GNU bug report logs - #61005
28.1.91; Encoding not detected in HTML files inside archives

Previous Next

Package: emacs;

Reported by: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>

Date: Sun, 22 Jan 2023 13:15:01 UTC

Severity: normal

Found in version 28.1.91

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61005 in the body.
You can then email your comments to 61005 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#61005; Package emacs. (Sun, 22 Jan 2023 13:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 22 Jan 2023 13:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.1.91; Encoding not detected in HTML files inside archives
Date: Sun, 22 Jan 2023 14:13:50 +0100
[Message part 1 (text/plain, inline)]
Problem
----

* Given an HTML file with charset "windows-1255". 

* Opening the file from disk detects the encoding correctly.

* Opening a ZIP archive with the same file inside and than opening the
  HTML archive member does not detect the encoding, instead the coding
  system for saving is the default according to M-x
  describe-coding-system.

Attached are two files test.html and test.zip.  Call "emacs -Q test.html
test.zip" and press RET on the archive member to reproduce.

[test.html (text/html, attachment)]
[test.zip (application/zip, attachment)]
[Message part 4 (text/plain, inline)]
Solution
----

The problem seems to be the function
sgml-html-meta-auto-coding-function.  It is missing a condition similar
to the one added to code in sgml-xml-auto-coding-function with commit
#df7ed10e in 2018.

modified   lisp/international/mule.el
@@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function
                   (bfcs-type
                    (coding-system-type buffer-file-coding-system)))
               (if (and enable-multibyte-characters
+                       ;; 'charset' will signal an error in
+                       ;; coding-system-equal, since it isn't a
+                       ;; coding-system.  So test that up front.
+                       (not (equal sym-type 'charset))
                        (coding-system-equal 'utf-8 sym-type)
                        (coding-system-equal 'utf-8 bfcs-type))
                   buffer-file-coding-system

I will send this as a patch as soon as I have a bug number to mention in
the commit message.

----

In GNU Emacs 28.1.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2022-08-29 built on arrian
Repository revision: f4168b8143008b787a11366462c928d761e90dd0
Repository branch: emacs-28
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE
XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Dired by date

Minor modes in effect:
  shell-dirtrack-mode: t
  desktop-save-mode: t
  display-time-mode: t
  xclip-mode: t
  xterm-mouse-mode: t
  delete-selection-mode: t
  cua-mode: t
  display-battery-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
~/Projects/ttf-mode/arc-mode-compat hides ~/emacs/arc-mode-compat
/home/benny/.emacs.d/elpa/transient-20210723.1601/transient hides /usr/local/share/emacs/28.1.91/lisp/transient
/home/benny/.emacs.d/elpa/dictionary-20201001.1727/dictionary hides /usr/local/share/emacs/28.1.91/lisp/net/dictionary

Features:
(shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa
epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode
mm-bodies mm-encode mailabbrev gmm-utils mailheader arc-mode
archive-mode benny-images dirtrack shell pcomplete misearch
multi-isearch thai-util thai-word lao-util enriched view tabify
benny-auto-insert ttf-glyphs rng-xsd xsd-regexp rng-cmpct rng-nxml
rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util
rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap sgml-mode facemenu dom
nxml-util nxml-enc xmltok mule-util jka-compr dired-aux time-date
bug-reference imenu desktop frameset highline benny-calendar-cfg
ange-ftp generic-x autoinsert cc-mode cc-fonts cc-guess cc-menus
cc-styles cc-align cc-cmds cc-engine cc-vars cc-defs ps-print
ps-print-loaddefs ps-def lpr advice cl-extra help-mode dired
dired-loaddefs derived benny-x-clipboard disp-table time server protbuf
xclip term/xterm xterm xt-mouse cal-china lunar solar cal-dst cal-bahai
cal-islam cal-hebrew holidays hol-loaddefs vc-git diff-mode easy-mmode
vc-dispatcher vc-fossil diary-lib diary-loaddefs cal-menu calendar
cal-loaddefs delsel grep compile text-property-search comint ansi-color
ring cua-base cus-load format-spec battery dbus xml sendmail mail-utils
.loaddefs benny-tools autoload radix-tree lisp-mnt mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr edmacro kmacro info
package browse-url url url-proxy url-privacy url-expand url-methods
url-history url-cookie url-domsuf url-util mailcap url-handlers
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache json subr-x map url-vars seq byte-opt gv bytecomp
byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice
button loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote threads dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 273770 13520)
 (symbols 48 18619 1)
 (strings 32 66582 2920)
 (string-bytes 1 2318045)
 (vectors 16 39996)
 (vector-slots 8 1131973 174560)
 (floats 8 762 66)
 (intervals 56 1039 60)
 (buffers 992 50))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#61005; Package emacs. (Sun, 22 Jan 2023 13:25:02 GMT) Full text and rfc822 format available.

Message #8 received at 61005 <at> debbugs.gnu.org (full text, mbox):

From: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>
To: 61005 <at> debbugs.gnu.org
Subject: Re: bug#61005: 28.1.91; Encoding not detected in HTML files inside
 archives
Date: Sun, 22 Jan 2023 14:24:07 +0100
[Message part 1 (text/plain, inline)]
The promised patch.  This is against master.

Also a small test-suite for sgml-html-meta-auto-coding-function, if you
want that.  If you care, I could also add one for
sgml-xml-auto-coding-function.

[0001-Fix-decoding-HTML-files-from-archives.patch (text/x-diff, attachment)]
[0002-Add-test-suite-for-sgml-html-meta-auto-coding-functi.patch (text/x-diff, attachment)]

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 22 Jan 2023 14:11:01 GMT) Full text and rfc822 format available.

Notification sent to Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>:
bug acknowledged by developer. (Sun, 22 Jan 2023 14:11:01 GMT) Full text and rfc822 format available.

Message #13 received at 61005-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>
Cc: 61005-done <at> debbugs.gnu.org
Subject: Re: bug#61005: 28.1.91;
 Encoding not detected in HTML files inside archives
Date: Sun, 22 Jan 2023 16:09:47 +0200
> From: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>
> Date: Sun, 22 Jan 2023 14:24:07 +0100
> 
> The promised patch.  This is against master.
> 
> Also a small test-suite for sgml-html-meta-auto-coding-function, if you
> want that.  If you care, I could also add one for
> sgml-xml-auto-coding-function.

Thanks, I installed this on the emacs-29 branch, and I'm closing the
bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 20 Feb 2023 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 58 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.