GNU bug report logs - #63644
29.0.91; Coding system detection defect in html

Previous Next

Package: emacs;

Reported by: Ikumi Keita <ikumi <at> ikumi.que.jp>

Date: Mon, 22 May 2023 14:00:02 UTC

Severity: normal

Found in version 29.0.91

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63644 in the body.
You can then email your comments to 63644 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63644; Package emacs. (Mon, 22 May 2023 14:00:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ikumi Keita <ikumi <at> ikumi.que.jp>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 22 May 2023 14:00:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ikumi Keita <ikumi <at> ikumi.que.jp>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.91; Coding system detection defect in html
Date: Mon, 22 May 2023 22:59:23 +0900
The function `sgml-html-meta-auto-coding-function' signals error for html
file with legacy encoding specification.

0. Save the following file as /tmp/foo.html with the coding system `euc-jp':
----------------------------------------------------------------------
<!DOCTYPE html>
<html lang="ja">
<head>
<meta charset="EUC-JP">
<title>dummy</title>
</head>
<body>
あいうえお
</body></html>
----------------------------------------------------------------------
1. emacs -Q
2. C-x C-f /tmp/foo.html RET
3. M-: (sgml-html-meta-auto-coding-function 1000) RET
4. Then emacs signals error with the following backtrace:
Debugger entered--Lisp error: (coding-system-error iso-2022)
  coding-system-plist(iso-2022)
  coding-system-equal(utf-8 iso-2022)
  sgml-html-meta-auto-coding-function(1000)
  eval((sgml-html-meta-auto-coding-function 1000) t)
  eval-expression((sgml-html-meta-auto-coding-function 1000) nil nil 127)
  funcall-interactively(eval-expression (sgml-html-meta-auto-coding-function 1000) nil nil 127)
  call-interactively(eval-expression nil nil)
  command-execute(eval-expression)

It seems that this error is due to change in
`sgml-html-meta-auto-coding-function' introduced in emacs 27. When I use
emacs 26.1 definition of the function, it returns `euc-jp' as expected.

Regards,
Ikumi Keita
#StandWithUkraine #StopWarInUkraine

In GNU Emacs 29.0.91 (build 1, x86_64-unknown-freebsd13.2, GTK+ Version
 3.24.34, cairo version 1.17.4) of 2023-05-22 built on freebsd.vmware
Windowing system distributor 'The X.Org Foundation', version 11.0.12101007
System Description: 13.2-RELEASE

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LCMS2 LIBXML2 MODULES NOTIFY KQUEUE PDUMPER PNG RSVG SOUND SQLITE3
THREADS TIFF TOOLKIT_SCROLL_BARS WEBP X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB

Important settings:
  value of $EMACSLOADPATH: /home/keita/elisp:
  value of $LANG: ja_JP.UTF-8
  locale-coding-system: utf-8-unix

Major mode: HTML+

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
/home/keita/elisp/reftex-parse hides /home/keita/scr/emacs-29.0.91/lisp/textmodes/reftex-parse

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa derived epg rfc6068 epg-config mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums debug backtrace find-func cl-extra pp cl-print
help-fns radix-tree help-mode yank-media mhtml-mode css-mode smie eww
xdg url-queue thingatpt shr pixel-fill kinsoku url-file svg xml
browse-url url url-proxy url-privacy url-expand url-methods url-history
url-cookie generate-lisp-file url-domsuf url-util url-parse auth-source
eieio eieio-core cl-macs password-cache url-vars mailcap puny mm-url
gnus nnheader gnus-util text-property-search time-date mail-utils range
wid-edit mm-util mail-prsvr color js c-ts-common treesit cl-seq json
subr-x map byte-opt gv bytecomp byte-compile imenu cc-mode cc-fonts
cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs
sgml-mode facemenu dom cl-loaddefs cl-lib japan-util rmc iso-transl
tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads dbusbind
kqueue lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit xinput2 x multi-tty
make-network-process emacs)

Memory information:
((conses 16 122445 10073)
 (symbols 48 12761 0)
 (strings 32 41719 1774)
 (string-bytes 1 1318626)
 (vectors 16 23622)
 (vector-slots 8 397515 14753)
 (floats 8 154 34)
 (intervals 56 350 0)
 (buffers 976 15))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63644; Package emacs. (Mon, 22 May 2023 16:04:01 GMT) Full text and rfc822 format available.

Message #8 received at 63644 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ikumi Keita <ikumi <at> ikumi.que.jp>
Cc: 63644 <at> debbugs.gnu.org
Subject: Re: bug#63644: 29.0.91; Coding system detection defect in html
Date: Mon, 22 May 2023 19:04:02 +0300
> From: Ikumi Keita <ikumi <at> ikumi.que.jp>
> Date: Mon, 22 May 2023 22:59:23 +0900
> 
> 0. Save the following file as /tmp/foo.html with the coding system `euc-jp':
> ----------------------------------------------------------------------
> <!DOCTYPE html>
> <html lang="ja">
> <head>
> <meta charset="EUC-JP">
> <title>dummy</title>
> </head>
> <body>
> あいうえお
> </body></html>
> ----------------------------------------------------------------------
> 1. emacs -Q
> 2. C-x C-f /tmp/foo.html RET
> 3. M-: (sgml-html-meta-auto-coding-function 1000) RET
> 4. Then emacs signals error with the following backtrace:
> Debugger entered--Lisp error: (coding-system-error iso-2022)
>   coding-system-plist(iso-2022)
>   coding-system-equal(utf-8 iso-2022)
>   sgml-html-meta-auto-coding-function(1000)
>   eval((sgml-html-meta-auto-coding-function 1000) t)
>   eval-expression((sgml-html-meta-auto-coding-function 1000) nil nil 127)
>   funcall-interactively(eval-expression (sgml-html-meta-auto-coding-function 1000) nil nil 127)
>   call-interactively(eval-expression nil nil)
>   command-execute(eval-expression)

Thanks.  Does the patch below give good results?

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 25b90b4..2b44a2e 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2484,10 +2484,12 @@ sgml-xml-auto-coding-function
                     ;; called as part of visiting a file, as opposed
                     ;; to when saving a buffer to a file.
                     (if (and enable-multibyte-characters
-                             ;; 'charset' will signal an error in
-                             ;; coding-system-equal, since it isn't a
-                             ;; coding-system.  So test that up front.
+                             ;; 'charset' and 'iso-2022' will signal
+                             ;; an error in coding-system-equal, since
+                             ;; they aren't coding-systems.  So test
+                             ;; that up front.
                              (not (equal sym-type 'charset))
+                             (not (equal sym-type 'iso-2022))
                              (coding-system-equal 'utf-8 sym-type)
                              (coding-system-equal 'utf-8 bfcs-type))
                         buffer-file-coding-system
@@ -2540,11 +2542,13 @@ sgml-html-meta-auto-coding-function
                   (bfcs-type
                    (coding-system-type buffer-file-coding-system)))
               (if (and enable-multibyte-characters
-                       ;; 'charset' will signal an error in
-                       ;; coding-system-equal, since it isn't a
-                       ;; coding-system.  So test that up front.
+                       ;; 'charset' and 'iso-2022' will signal an error
+                       ;; in coding-system-equal, since they aren't
+                       ;; coding-systems.  So test that up front.
                        (not (equal sym-type 'charset))
                        (not (equal bfcs-type 'charset))
+                       (not (equal sym-type 'iso-2022))
+                       (not (equal bfcs-type 'iso-2022))
                        (coding-system-equal 'utf-8 sym-type)
                        (coding-system-equal 'utf-8 bfcs-type))
                   buffer-file-coding-system




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63644; Package emacs. (Mon, 22 May 2023 16:43:02 GMT) Full text and rfc822 format available.

Message #11 received at 63644 <at> debbugs.gnu.org (full text, mbox):

From: Ikumi Keita <ikumi <at> ikumi.que.jp>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63644 <at> debbugs.gnu.org
Subject: Re: bug#63644: 29.0.91; Coding system detection defect in html
Date: Tue, 23 May 2023 01:42:39 +0900
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
> Thanks.  Does the patch below give good results?

Yes. It returns `euc-jp' as expected.

Regards,
Ikumi Keita
#StandWithUkraine #StopWarInUkraine




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Mon, 22 May 2023 18:26:02 GMT) Full text and rfc822 format available.

Notification sent to Ikumi Keita <ikumi <at> ikumi.que.jp>:
bug acknowledged by developer. (Mon, 22 May 2023 18:26:02 GMT) Full text and rfc822 format available.

Message #16 received at 63644-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ikumi Keita <ikumi <at> ikumi.que.jp>
Cc: 63644-done <at> debbugs.gnu.org
Subject: Re: bug#63644: 29.0.91; Coding system detection defect in html
Date: Mon, 22 May 2023 21:25:20 +0300
> From: Ikumi Keita <ikumi <at> ikumi.que.jp>
> cc: 63644 <at> debbugs.gnu.org
> Comments: In-reply-to Eli Zaretskii <eliz <at> gnu.org>
>    message dated "Mon, 22 May 2023 19:04:02 +0300."
> Date: Tue, 23 May 2023 01:42:39 +0900
> 
> >>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
> > Thanks.  Does the patch below give good results?
> 
> Yes. It returns `euc-jp' as expected.

Thanks, installed on the emacs-29 branch, and closing the bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 Jun 2023 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 281 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.