GNU logs - #70000, boring messages


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#70000: 29.2; Grapheme handling incorrect
Resent-From: Phillip Susi <phill@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 25 Mar 2024 18:47:01 +0000
Resent-Message-ID: <handler.70000.B.171139236311697 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 70000
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: 70000 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.171139236311697
          (code B ref -1); Mon, 25 Mar 2024 18:47:01 +0000
Received: (at submit) by debbugs.gnu.org; 25 Mar 2024 18:46:03 +0000
Received: from localhost ([127.0.0.1]:36258 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ropKc-00032a-8G
	for submit <at> debbugs.gnu.org; Mon, 25 Mar 2024 14:46:02 -0400
Received: from lists.gnu.org ([2001:470:142::17]:44128)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <phill@HIDDEN>) id 1ropKY-00031n-V0
 for submit <at> debbugs.gnu.org; Mon, 25 Mar 2024 14:46:00 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <phill@HIDDEN>)
 id 1ropKS-0005wA-5o
 for bug-gnu-emacs@HIDDEN; Mon, 25 Mar 2024 14:45:52 -0400
Received: from vps.thesusis.net ([34.202.238.73])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <phill@HIDDEN>)
 id 1ropKQ-0001lu-An
 for bug-gnu-emacs@HIDDEN; Mon, 25 Mar 2024 14:45:51 -0400
Received: by vps.thesusis.net (Postfix, from userid 1000)
 id C454A2B46D; Mon, 25 Mar 2024 14:45:48 -0400 (EDT)
From: Phillip Susi <phill@HIDDEN>
Date: Mon, 25 Mar 2024 14:45:48 -0400
Message-ID: <878r26duar.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain
Received-SPF: pass client-ip=34.202.238.73; envelope-from=phill@HIDDEN;
 helo=vps.thesusis.net
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 0.9 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.1 (/)

I had some terminal breakage the other day when browsing email with
notmuch.  Now a ways down the rabbit hole, it seems this is because
emacs does not correctly handle graphemes.  I found this article here:

https://mitchellh.com/writing/grapheme-clusters-in-terminals

If I paste that gramehe into GUI emacs, it is displayed as two separate
characters, each two columns wide, instead of the correct way: as a
single double wide character.  C-f and C-b move over the character as if
it were one, however, backspace deletes only the second, leaving both
the first and the zero width joiner.  If C-f and C-b treat it as one,
then so should backspace.

Under recent versions of the foot terminal emulator, this character is
displayed as a single, double wide character, but emacs assumes it still
is 4 colums wide, leading to terminal breakage.  Emacs needs to not
assume the width of graphemes are what wcwidth() reports, but instead
need to query the cursor position after printing one to find out how
wide the terminal actually dispalyed it as.



In GNU Emacs 29.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.39,
 cairo version 1.18.0) of 2024-02-26 built on localhost
System Description: Gentoo Linux

Configured using:
 'configure --prefix=/usr --build=x86_64-pc-linux-gnu
 --host=x86_64-pc-linux-gnu --mandir=/usr/share/man
 --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
 --localstatedir=/var/lib --datarootdir=/usr/share
 --disable-silent-rules --docdir=/usr/share/doc/emacs-29.2-r1
 --htmldir=/usr/share/doc/emacs-29.2-r1/html --libdir=/usr/lib64
 --program-suffix=-emacs-29 --includedir=/usr/include/emacs-29
 --infodir=/usr/share/info/emacs-29 --localstatedir=/var
 --enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp
 --without-compress-install --without-hesiod --without-pop
 --with-file-notification=inotify --with-pdumper --enable-acl
 --with-dbus --with-modules --without-gameuser --with-libgmp --with-gpm
 --with-native-compilation=aot --without-json --without-kerberos
 --without-kerberos5 --with-lcms2 --without-xml2 --without-mailutils
 --without-selinux --without-sqlite3 --with-gnutls --with-libsystemd
 --with-threads --with-tree-sitter --without-wide-int --with-sound=alsa
 --with-zlib --with-pgtk --without-x --without-ns
 --with-toolkit-scroll-bars --without-gconf --without-gsettings
 --without-harfbuzz --without-libotf --without-m17n-flt
 --without-xwidgets --with-gif --with-jpeg --with-png --with-rsvg
 --with-tiff --without-webp --without-imagemagick --with-dumping=pdumper
 'CFLAGS=-march=native -O2 -pipe' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM JPEG LCMS2 LIBSYSTEMD
MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND
THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER XIM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  column-number-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message yank-media puny dired
dired-loaddefs rfc822 mml mml-sec epa derived epg rfc6068 epg-config
gnus-util text-property-search time-date mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils cus-start cus-load
wid-edit descr-text enriched disp-table facemenu comp comp-cstr warnings
icons rx cl-extra help-mode manoj-dark-theme site-gentoo
ranger-autoloads scopeline-autoloads package browse-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie
generate-lisp-file url-domsuf url-util mailcap url-handlers url-parse
auth-source cl-seq eieio eieio-core cl-macs password-cache json subr-x
map byte-opt gv bytecomp byte-compile url-vars cl-loaddefs cl-lib rmc
iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook
vc-hooks lisp-float-type elisp-mode mwheel term/pgtk-win pgtk-win
term/common-win pgtk-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads dbusbind inotify dynamic-setting font-render-setting cairo gtk
pgtk lcms2 multi-tty make-network-process native-compile emacs)

Memory information:
((conses 16 121243 14450)
 (symbols 48 22924 0)
 (strings 32 87992 2869)
 (string-bytes 1 2065634)
 (vectors 16 27491)
 (vector-slots 8 1623278 223666)
 (floats 8 58 48)
 (intervals 56 908 0)
 (buffers 984 13))




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Phillip Susi <phill@HIDDEN>
Subject: bug#70000: Acknowledgement (29.2; Grapheme handling incorrect)
Message-ID: <handler.70000.B.171139236311697.ack <at> debbugs.gnu.org>
References: <878r26duar.fsf@HIDDEN>
X-Gnu-PR-Message: ack 70000
X-Gnu-PR-Package: emacs
Reply-To: 70000 <at> debbugs.gnu.org
Date: Mon, 25 Mar 2024 18:47:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-gnu-emacs@HIDDEN

If you wish to submit further information on this problem, please
send it to 70000 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
70000: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D70000
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#70000: 29.2; Grapheme handling incorrect
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 25 Mar 2024 19:36:02 +0000
Resent-Message-ID: <handler.70000.B70000.171139533417122 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 70000
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Phillip Susi <phill@HIDDEN>
Cc: 70000 <at> debbugs.gnu.org
Received: via spool by 70000-submit <at> debbugs.gnu.org id=B70000.171139533417122
          (code B ref 70000); Mon, 25 Mar 2024 19:36:02 +0000
Received: (at 70000) by debbugs.gnu.org; 25 Mar 2024 19:35:34 +0000
Received: from localhost ([127.0.0.1]:36326 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1roq6Y-0004S5-5n
	for submit <at> debbugs.gnu.org; Mon, 25 Mar 2024 15:35:34 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:46706)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>)
 id 1roq6W-0004Rn-3M; Mon, 25 Mar 2024 15:35:33 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1roq6R-0001Vf-Eb; Mon, 25 Mar 2024 15:35:27 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=uOxBp07R6KwnVrSVYMxQpKNQmjxmW1f7rvO5ks6F4uM=; b=pilbHEyD20y7
 fW2KGrDS+2eUVmRfSPU/6bfxUKJ0pah0RNZ3oymn9N+t/Gml7uTlYTTol0MGd3RsCgmMOfyNOTWVo
 ydRGBUW5tEGjnnp/63omIjDR25MyKYPkBWiO6b2r4RoTepsJiD66ZwO3vBSS0hujM12cNUZFcO70n
 BrPTHGAiSVS9YlcATi1ppDW6V29rlEECOX5sdj0PeNk0KsDGJhxAE5WEFjg1uUYux0vM3Zp6qegYr
 22XTJJjvu+sd6NO6QHZV7gHoOb6wFXkhnNF9ca4xsx6mGCTyJlNq7lBD/gsl4ckJnNOVN2a3rYesk
 1L+jRiZY1T0Wxoj8UnwRGg==;
Date: Mon, 25 Mar 2024 21:35:24 +0200
Message-Id: <86cyrije9v.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <878r26duar.fsf@HIDDEN> (message from Phillip Susi on
 Mon, 25 Mar 2024 14:45:48 -0400)
References: <878r26duar.fsf@HIDDEN>
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

tags 70000 notabug
thanks

> From: Phillip Susi <phill@HIDDEN>
> Date: Mon, 25 Mar 2024 14:45:48 -0400
> 
> I had some terminal breakage the other day when browsing email with
> notmuch.  Now a ways down the rabbit hole, it seems this is because
> emacs does not correctly handle graphemes.  I found this article here:
> 
> https://mitchellh.com/writing/grapheme-clusters-in-terminals
> 
> If I paste that gramehe into GUI emacs, it is displayed as two separate
> characters, each two columns wide, instead of the correct way: as a
> single double wide character.

First, the above blog talks about text-mode terminals (a.k.a. "TTYs"),
so it is not relevant to GUI Emacs session.

And second, how that particular sequence of codepoints is displayed on
GUI frames depends on how your Emacs was built.  According to the list
of features included in your report, viz.:

  Configured features:
  ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM JPEG LCMS2 LIBSYSTEMD
  MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND
  THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER XIM GTK3 ZLIB

your Emacs is built without HarfBuzz, which I think explains why your
Emacs displays the above sequences as 2 separate characters.
Furthermore, the appearance depends on the fonts you have installed;
specifically, Emoji sequences need a font that has a good support of
the Emoji Unicode blocks.  In my Emacs, which does use HarfBuzz, I see
a single grapheme cluster.

> C-f and C-b move over the character as if
> it were one, however, backspace deletes only the second, leaving both
> the first and the zero width joiner.  If C-f and C-b treat it as one,
> then so should backspace.

That Backspace deletes a single codepoint is a feature: it allows
easier editing of composable character sequences, such as Emoji.
E.g., imagine you want to make a slight change to the Emoji by
modifying just the second of the two characters composed into a
grapheme cluster.  Emacs supports deletion of the entire grapheme
cluster with the command delete-forward-char, by default bound to the
<Delete> function key.

> Under recent versions of the foot terminal emulator, this character is
> displayed as a single, double wide character, but emacs assumes it still
> is 4 colums wide, leading to terminal breakage.

Emacs cannot know what the terminal does with these characters,
because there's no widely-accepted protocol for accessing that
information.  Different terminal emulators behave differently, and
some even have options to modify their behavior via the various
settings.

> Emacs needs to not assume the width of graphemes are what wcwidth()
> reports, but instead need to query the cursor position after
> printing one to find out how wide the terminal actually dispalyed it
> as.

Querying the cursor position won't help in this case because it is
Emacs that moves the cursor when you type C-f, not the terminal.

I see no Emacs bug here.  Until we have standard ways of querying
text-mode terminals about their processing of composable character
sequences into grapheme clusters, there's no way for Emacs to behave
correctly with all such terminal emulators.  Sorry.




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 25 Mar 2024 19:35:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 25 15:35:35 2024
Received: from localhost ([127.0.0.1]:36328 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1roq6Y-0004S7-NA
	for submit <at> debbugs.gnu.org; Mon, 25 Mar 2024 15:35:35 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:46706)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>)
 id 1roq6W-0004Rn-3M; Mon, 25 Mar 2024 15:35:33 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1roq6R-0001Vf-Eb; Mon, 25 Mar 2024 15:35:27 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=uOxBp07R6KwnVrSVYMxQpKNQmjxmW1f7rvO5ks6F4uM=; b=pilbHEyD20y7
 fW2KGrDS+2eUVmRfSPU/6bfxUKJ0pah0RNZ3oymn9N+t/Gml7uTlYTTol0MGd3RsCgmMOfyNOTWVo
 ydRGBUW5tEGjnnp/63omIjDR25MyKYPkBWiO6b2r4RoTepsJiD66ZwO3vBSS0hujM12cNUZFcO70n
 BrPTHGAiSVS9YlcATi1ppDW6V29rlEECOX5sdj0PeNk0KsDGJhxAE5WEFjg1uUYux0vM3Zp6qegYr
 22XTJJjvu+sd6NO6QHZV7gHoOb6wFXkhnNF9ca4xsx6mGCTyJlNq7lBD/gsl4ckJnNOVN2a3rYesk
 1L+jRiZY1T0Wxoj8UnwRGg==;
Date: Mon, 25 Mar 2024 21:35:24 +0200
Message-Id: <86cyrije9v.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Phillip Susi <phill@HIDDEN>
In-Reply-To: <878r26duar.fsf@HIDDEN> (message from Phillip Susi on
 Mon, 25 Mar 2024 14:45:48 -0400)
Subject: Re: bug#70000: 29.2; Grapheme handling incorrect
References: <878r26duar.fsf@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: control
Cc: 70000 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

tags 70000 notabug
thanks

> From: Phillip Susi <phill@HIDDEN>
> Date: Mon, 25 Mar 2024 14:45:48 -0400
> 
> I had some terminal breakage the other day when browsing email with
> notmuch.  Now a ways down the rabbit hole, it seems this is because
> emacs does not correctly handle graphemes.  I found this article here:
> 
> https://mitchellh.com/writing/grapheme-clusters-in-terminals
> 
> If I paste that gramehe into GUI emacs, it is displayed as two separate
> characters, each two columns wide, instead of the correct way: as a
> single double wide character.

First, the above blog talks about text-mode terminals (a.k.a. "TTYs"),
so it is not relevant to GUI Emacs session.

And second, how that particular sequence of codepoints is displayed on
GUI frames depends on how your Emacs was built.  According to the list
of features included in your report, viz.:

  Configured features:
  ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM JPEG LCMS2 LIBSYSTEMD
  MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND
  THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER XIM GTK3 ZLIB

your Emacs is built without HarfBuzz, which I think explains why your
Emacs displays the above sequences as 2 separate characters.
Furthermore, the appearance depends on the fonts you have installed;
specifically, Emoji sequences need a font that has a good support of
the Emoji Unicode blocks.  In my Emacs, which does use HarfBuzz, I see
a single grapheme cluster.

> C-f and C-b move over the character as if
> it were one, however, backspace deletes only the second, leaving both
> the first and the zero width joiner.  If C-f and C-b treat it as one,
> then so should backspace.

That Backspace deletes a single codepoint is a feature: it allows
easier editing of composable character sequences, such as Emoji.
E.g., imagine you want to make a slight change to the Emoji by
modifying just the second of the two characters composed into a
grapheme cluster.  Emacs supports deletion of the entire grapheme
cluster with the command delete-forward-char, by default bound to the
<Delete> function key.

> Under recent versions of the foot terminal emulator, this character is
> displayed as a single, double wide character, but emacs assumes it still
> is 4 colums wide, leading to terminal breakage.

Emacs cannot know what the terminal does with these characters,
because there's no widely-accepted protocol for accessing that
information.  Different terminal emulators behave differently, and
some even have options to modify their behavior via the various
settings.

> Emacs needs to not assume the width of graphemes are what wcwidth()
> reports, but instead need to query the cursor position after
> printing one to find out how wide the terminal actually dispalyed it
> as.

Querying the cursor position won't help in this case because it is
Emacs that moves the cursor when you type C-f, not the terminal.

I see no Emacs bug here.  Until we have standard ways of querying
text-mode terminals about their processing of composable character
sequences into grapheme clusters, there's no way for Emacs to behave
correctly with all such terminal emulators.  Sorry.





Last modified: Mon, 25 Mar 2024 19:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.