GNU logs - #77410, boring messages


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#77410: term.el sometimes prints undecoded multibyte UTF-8 chars
Resent-From: Stephane Zermatten <szermatt@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 31 Mar 2025 17:46:02 +0000
Resent-Message-ID: <handler.77410.B.174344311718238 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 77410
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: patch
To: 77410 <at> debbugs.gnu.org
Cc: szermatt@HIDDEN
X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.174344311718238
          (code B ref -1); Mon, 31 Mar 2025 17:46:02 +0000
Received: (at submit) by debbugs.gnu.org; 31 Mar 2025 17:45:17 +0000
Received: from localhost ([127.0.0.1]:42745 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1tzJCG-0004jt-6F
	for submit <at> debbugs.gnu.org; Mon, 31 Mar 2025 13:45:17 -0400
Received: from lists.gnu.org ([2001:470:142::17]:51120)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <szermatt@HIDDEN>)
 id 1tzFyW-0002Od-Po
 for submit <at> debbugs.gnu.org; Mon, 31 Mar 2025 10:18:54 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <szermatt@HIDDEN>)
 id 1tzFyP-0006KW-0m
 for bug-gnu-emacs@HIDDEN; Mon, 31 Mar 2025 10:18:46 -0400
Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <szermatt@HIDDEN>)
 id 1tzFyM-0005lB-IO
 for bug-gnu-emacs@HIDDEN; Mon, 31 Mar 2025 10:18:44 -0400
Received: by mail-wm1-x334.google.com with SMTP id
 5b1f17b1804b1-43ea40a6e98so4136245e9.1
 for <bug-gnu-emacs@HIDDEN>; Mon, 31 Mar 2025 07:18:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1743430720; x=1744035520; darn=gnu.org;
 h=mime-version:message-id:date:cc:subject:to:from:sender:from:to:cc
 :subject:date:message-id:reply-to;
 bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=;
 b=OGI1q+13vnESkhu7uh+gcsGnZwNHppQ8pllEGmkZMJL15T1uI/3jmSBiOWBOatvNMY
 EWK++U/sUxDIgsIPkxYRig3JGP9UDYNiYrFJ6g3ne7zxcUujyhzFZz6kVRYNKwnzJx2Q
 mZcPJSFG25FGTPaqArUsc1QevUQdZ1CEdRdWLo8x/s8apC7capl0BPUEop5Am+/Kmany
 1gLXiBg6n7guCmMtne8jlRKye+n1NzKAfKlX9i2LOJqygVpEwxODFK2DuOYpDlQ3BVcR
 O3y7heVu+VJcm8pRcfyKoTXfLVe5mwt81EcyVma8lUEee56M0vZG703Cy9hW6ToH2lLV
 /qAQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1743430720; x=1744035520;
 h=mime-version:message-id:date:cc:subject:to:from:sender
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=;
 b=GfAUbtxsLgaQUqO+qVwDQGs4WSPERMPVSEL4FHizH+u+Ykl5jHICWlYMIgmc2ph7DW
 Uh5SeZBaE2+t9HuLFIOV142o/xQ0hDU2AuF2pEMQHMT5xETw1hPU5PPwFS/FzH4Ktu8n
 BdCbR3wrYj91zOjaWxcpkyXmQjAHQ+9OptjB5TxzThGxQ5EynZBQZmX/gH0tvK+g+/Gf
 kYbKmWEYA03emEuUoTPkJsm9A+cdxfaBh8+WSUpzu/XTiOXi5qQxuqAMvKQ36Hi5b4UR
 VtcbmyAQhuGyuLqJKH2aPxoO1LqTuaDD3h/qYa5MDbWrD/6A7/gfPYd2lPffZp/4gFb5
 Swxw==
X-Gm-Message-State: AOJu0Yxs/CKzc1bw5strkcVfNvko6oKLtlHvQcZd52PM0ljyr+Fhi6EU
 66vEL7heGZUym5GJas5E3p75kBiYdhWmxyVDibtE4R0hVQKIBmCkqWqhoiqY
X-Gm-Gg: ASbGncuvMjmKnAVikp+vV2Wf7TQd7Zg0ymOORH+aE79me5GAURmf4uF1Piqo+KRmoyX
 0keNGjNPjsTlLpQZoM0ZS9zIV3ftGOJPJM+9Bo/7Qb+6glakbHmpUYpTerXrKcIT1djZGqXtuha
 4e1XeQCKEDK7Y5ZU+aXCaKzNXJYNMKQ9HB7FYvpaM3Gl789uP/0DpUdkPrqqaRzj3k8vSW5HsFd
 +T42rRIPjG2yviBBVlEnDBidKbgIQU6gNXqp3bL04dpCUEJOL1rYLi5IoprtFneaYpZmCZEVmTJ
 nuQFpe+xNckJQguatIlpT/rgLapq7Ng0purY39MAqsFDUKN9V+p+GJKhCFFk
X-Google-Smtp-Source: AGHT+IH2r0vurB18Xl9WLYVT+F/bcFedgdUzE7wlG+eEpi/OHIrJt3jq+M14kNzhS+9L51QgoppgkQ==
X-Received: by 2002:a05:600c:699b:b0:43c:e305:6d50 with SMTP id
 5b1f17b1804b1-43db62c034bmr86446655e9.24.1743430719888; 
 Mon, 31 Mar 2025 07:18:39 -0700 (PDT)
Received: from boomer.zia ([62.74.15.163]) by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-39c0b79e082sm11610488f8f.69.2025.03.31.07.18.38
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 31 Mar 2025 07:18:39 -0700 (PDT)
From: Stephane Zermatten <szermatt@HIDDEN>
Date: Mon, 31 Mar 2025 17:18:35 +0300
Message-ID: <m2iknpthac.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Received-SPF: pass client-ip=2a00:1450:4864:20::334;
 envelope-from=szermatt@HIDDEN; helo=mail-wm1-x334.google.com
X-Spam_score_int: -19
X-Spam_score: -2.0
X-Spam_bar: --
X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001,
 FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 1.0 (+)
X-Mailman-Approved-At: Mon, 31 Mar 2025 13:45:14 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Tags: patch

If I run a shell in a terminal with M-x term, with a very unicode-heavy
prompt (fish 3.6 + tide), sometimes the Unicode characters are printed
undecoded.

One possible cause of this might be unfortunate chunking in the middle
of a character, which the attached patch fixes.

Without the patch, if I type this in M-x term /usr/bin/bash

for j in $(seq 0 3); do
  for i in $(seq 0 30); do
    printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80';
  done;
  echo;
done

I get
 \360\237\203\022\360\...

Instead of:
 =F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=
=F0=9F=98=80...

With the patch included, I get the correct output.

The issue comes from an incorrect check (> count partial 0), which
should really be (and (>=3D count partial) (> partial 0)), but I
simplified that to (> partial 0) in the patch, because the while loop
guarantees (>=3D count partial).

I rewrote the existing test to cover this case, and try out multiple
different combination of chunks.

I'm still looking into other causes of the issue, but this, at least,
seems like an easy fix.

In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70
 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.7.4

Configured using:
 'configure --disable-dependency-tracking --disable-silent-rules
 --enable-locallisppath=3D/usr/local/share/emacs/site-lisp
 --infodir=3D/usr/local/Cellar/emacs-plus@30/30.1/share/info/emacs
 --prefix=3D/usr/local/Cellar/emacs-plus@30/30.1
 --with-native-compilation=3Daot --with-xml2 --with-gnutls
 --without-compress-install --without-dbus --without-imagemagick
 --with-modules --with-rsvg --with-webp --with-ns
 --disable-ns-self-contained 'CFLAGS=3D-O2 -DFD_SETSIZE=3D10000
 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'
 'LDFLAGS=3D-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include''


--=-=-=
Content-Type: text/patch; charset=utf-8
Content-Disposition: attachment;
 filename=0001-Fix-issue-with-very-short-multibyte-character-chunk.patch
Content-Transfer-Encoding: quoted-printable

From 2bb6cec8f4f72009bcde1edab367f90ab82e5e2a Mon Sep 17 00:00:00 2001
From: Stephane Zermatten <szermatt@HIDDEN>
Date: Mon, 31 Mar 2025 16:41:08 +0300
Subject: [PATCH] Fix issue with very short multibyte character chunk.

Before this change, a chunk containing only a part
of a multibyte character would be discarded and
displayed undecoded on the terminal.

* lisp/term.el
---
 lisp/term.el            |  2 +-
 test/lisp/term-tests.el | 15 ++++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/lisp/term.el b/lisp/term.el
index 862103d88e6..a971300c055 100644
--- a/lisp/term.el
+++ b/lisp/term.el
@@ -3116,7 +3116,7 @@ term-emulate-terminal
                                                           (- count 1 parti=
al)))
                                       'eight-bit))
                         (incf partial))
-                      (when (> count partial 0)
+                      (when (> partial 0)
                         (setq term-terminal-undecoded-bytes
                               (substring decoded-substring (- partial)))
                         (setq decoded-substring
diff --git a/test/lisp/term-tests.el b/test/lisp/term-tests.el
index 5ef8c1174df..aad84e171b2 100644
--- a/test/lisp/term-tests.el
+++ b/test/lisp/term-tests.el
@@ -402,13 +402,14 @@ term-to-margin
 (ert-deftest term-decode-partial () ;; Bug#25288.
   "Test multibyte characters sent into multiple chunks."
   ;; Set `locale-coding-system' so test will be deterministic.
-  (let* ((locale-coding-system 'utf-8-unix)
-         (string (make-string 7 ?=D1=88))
-         (bytes (encode-coding-string string locale-coding-system)))
-    (should (equal string
-                   (term-test-screen-from-input
-                    40 1 `(,(substring bytes 0 (/ (length bytes) 2))
-                           ,(substring bytes (/ (length bytes) 2))))))))
+  (let ((locale-coding-system 'utf-8-unix))
+    (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input
+                          40 1 '("\321" "\210\321\210\321\210"))))
+    (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input
+                          40 1 '("\321\210\321" "\210\321\210"))))
+    (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input
+                          40 1 '("\321\210\321\210\321" "\210"))))))
+
 (ert-deftest term-undecodable-input () ;; Bug#29918.
   "Undecodable bytes should be passed through without error."
   (let* ((locale-coding-system 'utf-8-unix) ; As above.
--=20
2.47.0


--=-=-=--




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Stephane Zermatten <szermatt@HIDDEN>
Subject: bug#77410: Acknowledgement (term.el sometimes prints undecoded
 multibyte UTF-8 chars)
Message-ID: <handler.77410.B.174344311718238.ack <at> debbugs.gnu.org>
References: <m2iknpthac.fsf@HIDDEN>
X-Gnu-PR-Message: ack 77410
X-Gnu-PR-Package: emacs
X-Gnu-PR-Keywords: patch
Reply-To: 77410 <at> debbugs.gnu.org
Date: Mon, 31 Mar 2025 17:46:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-gnu-emacs@HIDDEN

If you wish to submit further information on this problem, please
send it to 77410 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
77410: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D77410
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems



Last modified: Mon, 31 Mar 2025 18:00:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.