X-Loop: help-debbugs@HIDDEN Subject: bug#77410: term.el sometimes prints undecoded multibyte UTF-8 chars Resent-From: Stephane Zermatten <szermatt@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Mon, 31 Mar 2025 17:46:02 +0000 Resent-Message-ID: <handler.77410.B.174344311718238 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 77410 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: 77410 <at> debbugs.gnu.org Cc: szermatt@HIDDEN X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.174344311718238 (code B ref -1); Mon, 31 Mar 2025 17:46:02 +0000 Received: (at submit) by debbugs.gnu.org; 31 Mar 2025 17:45:17 +0000 Received: from localhost ([127.0.0.1]:42745 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1tzJCG-0004jt-6F for submit <at> debbugs.gnu.org; Mon, 31 Mar 2025 13:45:17 -0400 Received: from lists.gnu.org ([2001:470:142::17]:51120) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <szermatt@HIDDEN>) id 1tzFyW-0002Od-Po for submit <at> debbugs.gnu.org; Mon, 31 Mar 2025 10:18:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <szermatt@HIDDEN>) id 1tzFyP-0006KW-0m for bug-gnu-emacs@HIDDEN; Mon, 31 Mar 2025 10:18:46 -0400 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <szermatt@HIDDEN>) id 1tzFyM-0005lB-IO for bug-gnu-emacs@HIDDEN; Mon, 31 Mar 2025 10:18:44 -0400 Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-43ea40a6e98so4136245e9.1 for <bug-gnu-emacs@HIDDEN>; Mon, 31 Mar 2025 07:18:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743430720; x=1744035520; darn=gnu.org; h=mime-version:message-id:date:cc:subject:to:from:sender:from:to:cc :subject:date:message-id:reply-to; bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=; b=OGI1q+13vnESkhu7uh+gcsGnZwNHppQ8pllEGmkZMJL15T1uI/3jmSBiOWBOatvNMY EWK++U/sUxDIgsIPkxYRig3JGP9UDYNiYrFJ6g3ne7zxcUujyhzFZz6kVRYNKwnzJx2Q mZcPJSFG25FGTPaqArUsc1QevUQdZ1CEdRdWLo8x/s8apC7capl0BPUEop5Am+/Kmany 1gLXiBg6n7guCmMtne8jlRKye+n1NzKAfKlX9i2LOJqygVpEwxODFK2DuOYpDlQ3BVcR O3y7heVu+VJcm8pRcfyKoTXfLVe5mwt81EcyVma8lUEee56M0vZG703Cy9hW6ToH2lLV /qAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743430720; x=1744035520; h=mime-version:message-id:date:cc:subject:to:from:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=; b=GfAUbtxsLgaQUqO+qVwDQGs4WSPERMPVSEL4FHizH+u+Ykl5jHICWlYMIgmc2ph7DW Uh5SeZBaE2+t9HuLFIOV142o/xQ0hDU2AuF2pEMQHMT5xETw1hPU5PPwFS/FzH4Ktu8n BdCbR3wrYj91zOjaWxcpkyXmQjAHQ+9OptjB5TxzThGxQ5EynZBQZmX/gH0tvK+g+/Gf kYbKmWEYA03emEuUoTPkJsm9A+cdxfaBh8+WSUpzu/XTiOXi5qQxuqAMvKQ36Hi5b4UR VtcbmyAQhuGyuLqJKH2aPxoO1LqTuaDD3h/qYa5MDbWrD/6A7/gfPYd2lPffZp/4gFb5 Swxw== X-Gm-Message-State: AOJu0Yxs/CKzc1bw5strkcVfNvko6oKLtlHvQcZd52PM0ljyr+Fhi6EU 66vEL7heGZUym5GJas5E3p75kBiYdhWmxyVDibtE4R0hVQKIBmCkqWqhoiqY X-Gm-Gg: ASbGncuvMjmKnAVikp+vV2Wf7TQd7Zg0ymOORH+aE79me5GAURmf4uF1Piqo+KRmoyX 0keNGjNPjsTlLpQZoM0ZS9zIV3ftGOJPJM+9Bo/7Qb+6glakbHmpUYpTerXrKcIT1djZGqXtuha 4e1XeQCKEDK7Y5ZU+aXCaKzNXJYNMKQ9HB7FYvpaM3Gl789uP/0DpUdkPrqqaRzj3k8vSW5HsFd +T42rRIPjG2yviBBVlEnDBidKbgIQU6gNXqp3bL04dpCUEJOL1rYLi5IoprtFneaYpZmCZEVmTJ nuQFpe+xNckJQguatIlpT/rgLapq7Ng0purY39MAqsFDUKN9V+p+GJKhCFFk X-Google-Smtp-Source: AGHT+IH2r0vurB18Xl9WLYVT+F/bcFedgdUzE7wlG+eEpi/OHIrJt3jq+M14kNzhS+9L51QgoppgkQ== X-Received: by 2002:a05:600c:699b:b0:43c:e305:6d50 with SMTP id 5b1f17b1804b1-43db62c034bmr86446655e9.24.1743430719888; Mon, 31 Mar 2025 07:18:39 -0700 (PDT) Received: from boomer.zia ([62.74.15.163]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39c0b79e082sm11610488f8f.69.2025.03.31.07.18.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Mar 2025 07:18:39 -0700 (PDT) From: Stephane Zermatten <szermatt@HIDDEN> Date: Mon, 31 Mar 2025 17:18:35 +0300 Message-ID: <m2iknpthac.fsf@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::334; envelope-from=szermatt@HIDDEN; helo=mail-wm1-x334.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Mailman-Approved-At: Mon, 31 Mar 2025 13:45:14 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Tags: patch If I run a shell in a terminal with M-x term, with a very unicode-heavy prompt (fish 3.6 + tide), sometimes the Unicode characters are printed undecoded. One possible cause of this might be unfortunate chunking in the middle of a character, which the attached patch fixes. Without the patch, if I type this in M-x term /usr/bin/bash for j in $(seq 0 3); do for i in $(seq 0 30); do printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80'; done; echo; done I get \360\237\203\022\360\... Instead of: =F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80= =F0=9F=98=80... With the patch included, I get the correct output. The issue comes from an incorrect check (> count partial 0), which should really be (and (>=3D count partial) (> partial 0)), but I simplified that to (> partial 0) in the patch, because the while loop guarantees (>=3D count partial). I rewrote the existing test to cover this case, and try out multiple different combination of chunks. I'm still looking into other causes of the issue, but this, at least, seems like an easy fix. In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia Windowing system distributor 'Apple', version 10.3.2487 System Description: macOS 14.7.4 Configured using: 'configure --disable-dependency-tracking --disable-silent-rules --enable-locallisppath=3D/usr/local/share/emacs/site-lisp --infodir=3D/usr/local/Cellar/emacs-plus@30/30.1/share/info/emacs --prefix=3D/usr/local/Cellar/emacs-plus@30/30.1 --with-native-compilation=3Daot --with-xml2 --with-gnutls --without-compress-install --without-dbus --without-imagemagick --with-modules --with-rsvg --with-webp --with-ns --disable-ns-self-contained 'CFLAGS=3D-O2 -DFD_SETSIZE=3D10000 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include' 'LDFLAGS=3D-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'' --=-=-= Content-Type: text/patch; charset=utf-8 Content-Disposition: attachment; filename=0001-Fix-issue-with-very-short-multibyte-character-chunk.patch Content-Transfer-Encoding: quoted-printable From 2bb6cec8f4f72009bcde1edab367f90ab82e5e2a Mon Sep 17 00:00:00 2001 From: Stephane Zermatten <szermatt@HIDDEN> Date: Mon, 31 Mar 2025 16:41:08 +0300 Subject: [PATCH] Fix issue with very short multibyte character chunk. Before this change, a chunk containing only a part of a multibyte character would be discarded and displayed undecoded on the terminal. * lisp/term.el --- lisp/term.el | 2 +- test/lisp/term-tests.el | 15 ++++++++------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/lisp/term.el b/lisp/term.el index 862103d88e6..a971300c055 100644 --- a/lisp/term.el +++ b/lisp/term.el @@ -3116,7 +3116,7 @@ term-emulate-terminal (- count 1 parti= al))) 'eight-bit)) (incf partial)) - (when (> count partial 0) + (when (> partial 0) (setq term-terminal-undecoded-bytes (substring decoded-substring (- partial))) (setq decoded-substring diff --git a/test/lisp/term-tests.el b/test/lisp/term-tests.el index 5ef8c1174df..aad84e171b2 100644 --- a/test/lisp/term-tests.el +++ b/test/lisp/term-tests.el @@ -402,13 +402,14 @@ term-to-margin (ert-deftest term-decode-partial () ;; Bug#25288. "Test multibyte characters sent into multiple chunks." ;; Set `locale-coding-system' so test will be deterministic. - (let* ((locale-coding-system 'utf-8-unix) - (string (make-string 7 ?=D1=88)) - (bytes (encode-coding-string string locale-coding-system))) - (should (equal string - (term-test-screen-from-input - 40 1 `(,(substring bytes 0 (/ (length bytes) 2)) - ,(substring bytes (/ (length bytes) 2)))))))) + (let ((locale-coding-system 'utf-8-unix)) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321" "\210\321\210\321\210")))) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321\210\321" "\210\321\210")))) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321\210\321\210\321" "\210")))))) + (ert-deftest term-undecodable-input () ;; Bug#29918. "Undecodable bytes should be passed through without error." (let* ((locale-coding-system 'utf-8-unix) ; As above. --=20 2.47.0 --=-=-=--
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Stephane Zermatten <szermatt@HIDDEN> Subject: bug#77410: Acknowledgement (term.el sometimes prints undecoded multibyte UTF-8 chars) Message-ID: <handler.77410.B.174344311718238.ack <at> debbugs.gnu.org> References: <m2iknpthac.fsf@HIDDEN> X-Gnu-PR-Message: ack 77410 X-Gnu-PR-Package: emacs X-Gnu-PR-Keywords: patch Reply-To: 77410 <at> debbugs.gnu.org Date: Mon, 31 Mar 2025 17:46:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-gnu-emacs@HIDDEN If you wish to submit further information on this problem, please send it to 77410 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 77410: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D77410 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.