GNU bug report logs - #54032
29.0.50; Emoji display on Linux console switched to hexadecimal output

Previous Next

Package: emacs;

Reported by: Aura Kelloniemi <kaura.dev <at> sange.fi>

Date: Thu, 17 Feb 2022 06:58:01 UTC

Severity: normal

Found in version 29.0.50

To reply to this bug, email your comments to 54032 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#54032; Package emacs. (Thu, 17 Feb 2022 06:58:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Aura Kelloniemi <kaura.dev <at> sange.fi>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 17 Feb 2022 06:58:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Aura Kelloniemi <kaura.dev <at> sange.fi>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; Emoji display on Linux console switched to hexadecimal output
Date: Thu, 17 Feb 2022 08:57:40 +0200
Hello,

on recent Emacs development repository builds, emoji characters are no more
displayed on Linux console. Instead Emacs prints \UABCDEF hexadecimal codes.

This is due to the commit 10c680551e899805a6de7360e9b65986fd87df72 which
probably makes things better on some terminals. Reverting this commit fixed
the issue for me, and emojis are again displayed as usual.

Linux console is (sort of) capable of displaying emojis. Console font can be
configured so that it has glyphs for emojis exactly the same way as for other
characters. Also, blind users using refreshable braile displays use Linux
console to access Emacs. The braille terminal driver is able to detect the
correct character code points even when Linux itself is not able to display
them properly on the screen. This detection is done using the /dev/vcsu
(virtual console screen unicode) character devices. For these reasons it is
important that emacs outputs the real characters to the terminal on Linux
console.

Linux console has a terminal type string of "linux". lisp/term/linux.el
contains already some Linux terminal specific code (which unfortunately
assumes though that Linux has a default character set of Latin-1, which has
never been true).

My preferred solution to this problem would be to add and document a way to
configure character display logic on TTYs more precisely. It would be great to
be able to control the terminal output of Unicode on grapheme cluster
precision – i.e. allow the user to define a function which translates code
points/grapheme clusters into something that their terminal can display.

I believe that Linux console is not the only terminal that behaves peculiarly
when it comes to Unicode support. So this might benefit others than Linux
VT users too.

-- 
Aura

In GNU Emacs 29.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version 3.24.31, cairo version 1.17.4)
 of 2022-02-16 built on solaria
Repository revision: e6e723bb4d300e6ceeeb12bf43bf3d54a6108cac
Repository branch: makepkg
System Description: Arch Linux

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --with-native-compilation --with-x-toolkit=gtk3
 --with-xft --with-wide-int --with-modules --with-gameuser=:games
 --with-sound=alsa --with-cairo --with-harfbuzz
 --enable-link-time-optimization 'CFLAGS=-march=native -mtune=native -O2 -pipe
 -fno-plt -fuse-ld=gold -flto' CPPFLAGS=-D_FORTIFY_SOURCE=2
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON
LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS WEBP
X11 XDBE XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: fi_FI.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  telega-root-auto-fill-mode: t
  telega-active-locations-mode: t
  telega-patrons-mode: t
  gpm-mouse-mode: t
  leaf-key-override-global-mode: t
  shell-dirtrack-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  icomplete-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: linux
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t

Load-path shadows:
/home/aura/.config/emacs/elpa/transient-20220130.1941/transient hides /usr/share/emacs/29.0.50/lisp/transient

Features:
(shadow sort company-oddmuse company-keywords company-etags etags fileloop
xref project company-gtags company-dabbrev-code company-dabbrev company-files
company-clang company-capf company-cmake company-semantic company-template
company-bbdb mail-extr textsec uni-scripts idna-mapping ucs-normalize
uni-confusable textsec-check shr pixel-fill kinsoku vterm bookmark face-remap
compile term disp-table ehelp vterm-module term/xterm xterm mule-util
telega-obsolete telega telega-tdlib-events telega-webpage visual-fill-column
telega-root telega-info telega-chat telega-modes image-mode exif
telega-company telega-user telega-notifications notifications dbus telega-voip
telega-msg telega-tme telega-sticker telega-i18n telega-vvnote bindat
telega-ffplay telega-media telega-sort telega-filter telega-ins telega-folders
telega-inline telega-tdlib telega-util rainbow-identifiers org-element
avl-tree generator org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro
org-footnote org-src ob-comint org-pcomplete org-list org-faces org-entities
noutline outline org-version ob-emacs-lisp ob-core ob-eval org-table oc-basic
bibtex ol org-keys oc org-compat advice org-macs org-loaddefs dired-aux color
ewoc telega-server telega-core telega-customize svg dom xml emacsbug sendmail
find-func cursor-sensor comp comp-cstr warnings rx cl-extra help-mode t-mouse
term/linux recentf tree-widget notmuch notmuch-tree notmuch-jump notmuch-hello
notmuch-show notmuch-print notmuch-crypto notmuch-mua notmuch-message
notmuch-draft notmuch-maildir-fcc notmuch-address notmuch-company
notmuch-parser notmuch-wash diff-mode easy-mmode coolj notmuch-query goto-addr
thingatpt icalendar diary-lib diary-loaddefs cal-menu calendar cal-loaddefs
notmuch-tag crm notmuch-lib notmuch-version notmuch-compat hl-line message
yank-media rmc puny dired dired-loaddefs rfc822 mml mailabbrev mail-utils
gmm-utils mailheader mm-view mml-smime mml-sec epa derived epg rfc6068
epg-config gnus-util text-property-search smime dig mm-decode mm-bodies
mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr
company pcase server leaf-keywords leaf finder-inf package browse-url url
url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util mailcap url-handlers url-parse url-vars tramp tramp-loaddefs trampver
tramp-integration cus-edit pp wid-edit files-x tramp-compat shell pcomplete
comint ansi-color ring parse-time iso8601 time-date ls-lisp format-spec
auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json
map savehist minibuf-eldef keypad ido seq gv subr-x byte-opt bytecomp
byte-compile cconv icomplete desktop frameset cl-loaddefs cl-lib cus-load info
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice button loaddefs
faces cus-face macroexp files window text-properties overlay sha1 md5 base64
format env code-pages mule custom widget keymap hashtable-print-readable
backquote threads dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting cairo move-toolbar gtk x-toolkit x multi-tty
make-network-process native-compile emacs)

Memory information:
((conses 16 899499 30380)
 (symbols 48 32992 10)
 (strings 32 222707 14412)
 (string-bytes 1 6442228)
 (vectors 16 114679)
 (vector-slots 8 1497371 73794)
 (floats 8 8158 532)
 (intervals 56 5228 1491)
 (buffers 992 15))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54032; Package emacs. (Thu, 17 Feb 2022 07:53:02 GMT) Full text and rfc822 format available.

Message #8 received at 54032 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Aura Kelloniemi <kaura.dev <at> sange.fi>
Cc: 54032 <at> debbugs.gnu.org
Subject: Re: bug#54032: 29.0.50;
 Emoji display on Linux console switched to hexadecimal output
Date: Thu, 17 Feb 2022 09:52:15 +0200
> From: Aura Kelloniemi <kaura.dev <at> sange.fi>
> Date: Thu, 17 Feb 2022 08:57:40 +0200
> 
> on recent Emacs development repository builds, emoji characters are no more
> displayed on Linux console. Instead Emacs prints \UABCDEF hexadecimal codes.
> 
> This is due to the commit 10c680551e899805a6de7360e9b65986fd87df72 which
> probably makes things better on some terminals. Reverting this commit fixed
> the issue for me, and emojis are again displayed as usual.

Thanks, but I think we need more detailed information to understand
the problem.  First, when you say "display emoji characters", which
characters exactly does that allude to?  Can you show specific
examples of text that includes Emoji, which displayed the Emoji glyphs
before the above commit, but not after it?  In particular, are we
talking about single codepoints in the Emoji block, or are we talking
about Emoji sequences that involve more than one codepoint (and are
supposed to display like a single Emoji glyph)?  For each example,
please show both the text and what you see on display for that text.

Also, what is the value of auto-composition-mode?  I think it should
be the string "linux", in which case please try setting it to t and
see if the display becomes better or worse.

> Linux console is (sort of) capable of displaying emojis. Console font can be
> configured so that it has glyphs for emojis exactly the same way as for other
> characters.

If that is the case, why did Emacs think the terminal cannot display
these characters?  Can you step with GDB inside terminal_glyph_code,
when it is called for the first time in the Emacs session, and see
whether the ioctl call we issue in calculate_glyph_code_table returns
valid values for the Emoji codepoints?

> Also, blind users using refreshable braile displays use Linux
> console to access Emacs. The braille terminal driver is able to detect the
> correct character code points even when Linux itself is not able to display
> them properly on the screen. This detection is done using the /dev/vcsu
> (virtual console screen unicode) character devices.

Are you saying that the braille terminal driver will not respond to
the ioctl call we issue in calculate_glyph_code_table?  Is there any
other method of knowing which characters are supported in that case?

> For these reasons it is important that emacs outputs the real
> characters to the terminal on Linux console.

Outputting codepoints for which there are no glyphs produced
unreadable display, since (AFAIU) the console displays them all as the
"diamond" replacement character.  Detecting the fact that a codepoint
cannot be displayed allows us to produce something that at least can
be interpreted, and allows the user to install optional features (such
as those provided by latin1-disp.el) which will replace the characters
that cannot be displayed by equivalent strings, for example ASCII
strings.

So we would like to keep the automatic detection of whether a given
character can be displayed by the console, although it sounds like the
current solution should be made more flexible and sophisticated in
some way.

> Linux console has a terminal type string of "linux". lisp/term/linux.el
> contains already some Linux terminal specific code (which unfortunately
> assumes though that Linux has a default character set of Latin-1, which has
> never been true).

That's just the default.  We attempt to detect which characters can be
displayed later on, when the functions I mentioned above are called
during startup.

> My preferred solution to this problem would be to add and document a way to
> configure character display logic on TTYs more precisely. It would be great to
> be able to control the terminal output of Unicode on grapheme cluster
> precision – i.e. allow the user to define a function which translates code
> points/grapheme clusters into something that their terminal can display.

I'm not yet sure something like that would be needed.  It will
certainly slow down the display on the console, which is undesirable
for obvious reasons.  It is also too complex (not every Emacs user can
write Lisp programs that play sophisticated games with characters and
glyphs).

I think we don't yet have a detailed enough understanding of the issue
to discuss solutions, so I suggest to postpone this discussion until
the questions I asked above are answered, and we have a good
understanding of what is going on.

The commit to which you point out was made based on reports from
another user of the Linux console (albeit not about Emoji), and in
that case the change had a positive effect.  So the issue is not
simple, and we need a good understanding of it before we devise a
solution.

Thanks.




This bug report was last modified 2 years and 283 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.