GNU bug report logs - #39799
28.0.50; Most emoji sequences don’t render correctly

Previous Next

Package: emacs;

Reported by: Mike FABIAN <mfabian <at> redhat.com>

Date: Wed, 26 Feb 2020 14:30:03 UTC

Severity: normal

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39799 in the body.
You can then email your comments to 39799 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Wed, 26 Feb 2020 14:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mike FABIAN <mfabian <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 26 Feb 2020 14:30:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; Most emoji sequences don’t render
 correctly
Date: Wed, 26 Feb 2020 15:28:58 +0100
[Message part 1 (text/plain, inline)]
As can be seen in the attached screenshot, some 

👩‍🦰 U+1F469 U+200D U+1F9B0 woman: red hair
🧑‍🦰 U+1F9D1 U+200D U+1F9B0 person: red hair

don’t render correctly in the screenshot, although they work using the
same font (“Joypixels”, version 5.5) elsewhere, e.g. in gedit.

Same result in Emacs when using "Noto Color Emoji", both emoji sequences
are rendered as 2 characters each in Emacs (In gedit, “U+1F469 U+200D
U+1F9B0 woman: red hair” works but “U+1F9D1 U+200D U+1F9B0 person: red
hair” does not, so this is likely because the “Noto Color Emoji” font
does not yet support the latter sequence).

When loading

http://www.unicode.org/Public/emoji/12.0/emoji-zwj-sequences.txt

into Emacs one can see that most sequences don’t render correctly
(actually *all* sequences, as far as I can see).

Also, when loading

http://www.unicode.org/Public/emoji/12.0/emoji-sequences.txt

into Emacs, one can see that the Flag sequences and skin colour
sequences don’t render correctly either (not a font problem, both
“Noto Color Emoji” and “Joypixels” support these):

1F1FF 1F1FC   ; RGI_Emoji_Flag_Sequence      ; flag: Zimbabwe                                                 # E2.0   [1] (🇿🇼)

1F3F4 E0067 E0062 E0065 E006E E0067 E007F; RGI_Emoji_Tag_Sequence; flag: England                              # E5.0   [1] (🏴󠁧󠁢󠁥󠁮󠁧󠁿)

261D 1F3FB    ; RGI_Emoji_Modifier_Sequence  ; index pointing up: light skin tone                             # E1.0   [1] (☝🏻)

------------


In GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.13, cairo version 1.16.0)
 of 2020-02-26 built on taka.site
Repository revision: 1dd44e633aed1ea10e9b611e844618814d6537aa
Repository branch: emacs-master-mike
Windowing system distributor 'Fedora Project', version 11.0.12006000
System Description: Fedora 31 (Workstation Edition)

Recent messages:
Wrote /home/mfabian/.newsrc.eld
Saving /home/mfabian/.newsrc.eld...done
No more unseen articles
No more unread articles
Mark activated
Updating buffer list...done
Commands: m, u, t, RET, g, k, S, D, Q; q to quit; h for help
Mark set
Quit
Mark activated

Configured using:
 'configure --prefix=/packages/stow/emacs-master-20200226 --with-cairo'

Configured features:
XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM DBUS GSETTINGS GLIB NOTIFY
INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF
ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS LIBSYSTEMD
PDUMPER LCMS2 GMP

Important settings:
  value of $LC_MESSAGES: ja_JP.UTF-8
  value of $LC_TIME: ja_JP.UTF-8
  value of $LANG: C.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Message

Minor modes in effect:
  gnus-message-citation-mode: t
  mml-mode: t
  global-edit-server-edit-mode: t
  erc-networks-mode: t
  erc-menu-mode: t
  erc-list-mode: t
  erc-pcomplete-mode: t
  erc-autoaway-mode: t
  erc-log-mode: t
  erc-button-mode: t
  erc-netsplit-mode: t
  erc-ring-mode: t
  erc-fill-mode: t
  erc-stamp-mode: t
  erc-track-mode: t
  erc-track-minor-mode: t
  erc-match-mode: t
  erc-autojoin-mode: t
  erc-irccontrols-mode: t
  erc-noncommands-mode: t
  erc-readonly-mode: t
  erc-scrolltobottom-mode: t
  jabber-activity-mode: t
  show-paren-mode: t
  display-time-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  auto-fill-function: message-do-auto-fill
  transient-mark-mode: t
  abbrev-mode: t

Load-path shadows:
/home/mfabian/emacs-packages/woman hides /packages/stow/emacs-master-20200226/share/emacs/28.0.50/lisp/woman
/home/mfabian/emacs-packages/xt-mouse hides /packages/stow/emacs-master-20200226/share/emacs/28.0.50/lisp/xt-mouse
/home/mfabian/emacs/find-dired hides /packages/stow/emacs-master-20200226/share/emacs/28.0.50/lisp/find-dired
/home/mfabian/emacs/refill hides /packages/stow/emacs-master-20200226/share/emacs/28.0.50/lisp/textmodes/refill

Features:
(shadow emacsbug mm-archive jka-compr canlock sort gnus-cite mail-extr
gnus-bcklg misearch multi-isearch gnus-async qp gnus-ml disp-table
gnus-topic cursor-sensor utf-7 nndraft nnmh network-stream nsm nnml
gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-cache
gnus-demon nntp smtpmail sendmail external-abook nnir gnus-msg gnus-art
mm-uu mml2015 mm-view mml-smime smime dig gnus-sum url url-proxy
url-privacy url-expand url-methods url-history shr url-cookie url-domsuf
url-util url-parse url-vars svg gnus-group gnus-undo gnus-start
gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo parse-time iso8601
gnus-spec gnus-int gnus-range message rmc rfc822 mml mml-sec epa epg
epg-config mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader gnus-win gnus nnheader gnus-util rmail
rmail-loaddefs rfc2047 rfc2045 ietf-drums text-property-search
mail-utils mm-util mail-prsvr ibuf-ext ibuffer ibuffer-loaddefs server
edit-server quail erc-networks erc-menu erc-list erc-pcomplete pcomplete
erc-autoaway erc-log erc-button browse-url erc-netsplit erc-ring
erc-fill erc-stamp erc-track cl-extra help-mode sauron-erc sauron
derived erc-match erc-join erc-goodies erc erc-backend erc-compat
auth-source eieio eieio-core eieio-loaddefs password-cache json map
thingatpt pp erc-loaddefs jabber jabber-libnotify dbus jabber-awesome
jabber-osd jabber-wmii jabber-xmessage jabber-festival jabber-sawfish
jabber-ratpoison jabber-screen jabber-socks5 jabber-ft-server
jabber-si-server jabber-ft-client jabber-ft-common jabber-si-client
jabber-si-common jabber-feature-neg jabber-truncate jabber-time
jabber-autoaway time-date subr-x jabber-vcard-avatars jabber-chatstates
jabber-events jabber-vcard jabber-avatar mailcap jabber-activity
jabber-watch jabber-modeline advice jabber-ahc-presence jabber-ahc
jabber-version jabber-ourversion jabber-muc-nick-completion hippie-exp
comint ansi-color ring jabber-browse jabber-search jabber-register
jabber-roster format-spec jabber-presence jabber-muc
jabber-muc-nick-coloring assoc hexrgb jabber-newdisco jabber-widget
jabber-disco jabber-chat jabber-history jabber-chatbuffer jabber-alert
jabber-iq jabber-core jabber-console sgml-mode dom ewoc jabber-keymap
jabber-sasl sasl sasl-anonymous sasl-login sasl-plain fsm jabber-logon
jabber-conn srv dns tls gnutls puny seq byte-opt bytecomp byte-compile
cconv jabber-xml xml jabber-menu jabber-autoloads jabber-util starttls
footnote rx w3m-cookie w3m easymenu timezone w3m-hist w3m-fb easy-mmode
w3m-ems mule-util w3m-ccl ccl w3m-favicon w3m-image cl-seq w3m-proc
w3m-util wid-edit cl-macs cl gv edmacro kmacro cl-loaddefs cl-lib
find-dired dired dired-loaddefs ispell paren avoid time tooltip eldoc
electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win
x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads dbusbind inotify lcms2 dynamic-setting
system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit
x multi-tty make-network-process emacs)

Memory information:
((conses 16 1335259 111919)
 (symbols 48 25134 2)
 (strings 32 100843 24645)
 (string-bytes 1 2730315)
 (vectors 16 51368)
 (vector-slots 8 1397567 305406)
 (floats 8 363 323)
 (intervals 56 14489 1377)
 (buffers 1000 80))

-- 
Mike FABIAN <mfabian <at> redhat.com>

[emacs-color-emoji.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 07:15:01 GMT) Full text and rfc822 format available.

Message #8 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50;
 Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 09:14:14 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Date: Wed, 26 Feb 2020 15:28:58 +0100
> 
> As can be seen in the attached screenshot, some 
> 
> 👩‍🦰 U+1F469 U+200D U+1F9B0 woman: red hair
> 🧑‍🦰 U+1F9D1 U+200D U+1F9B0 person: red hair
> 
> don’t render correctly in the screenshot, although they work using the
> same font (“Joypixels”, version 5.5) elsewhere, e.g. in gedit.
> 
> Same result in Emacs when using "Noto Color Emoji", both emoji sequences
> are rendered as 2 characters each in Emacs

Not 2, 3.  Look more closely, and you will see that the U+200D ZWJ
character is displayed as a thin (1-pixel) space between the 2 emoji.

> When loading
> 
> http://www.unicode.org/Public/emoji/12.0/emoji-zwj-sequences.txt
> 
> into Emacs one can see that most sequences don’t render correctly
> (actually *all* sequences, as far as I can see).

That's just a matter of setting up composition-function-table to
support these sequences.  For example, try the above again after
evaluating:

  (set-char-table-range composition-function-table '(#x1F9B0 . #x1F9B3)
			(list
			 (vector
			  "[\U0001F468-\U0001F469]\u200D[\U0001F9B0-\U0001F9B3]"
			  2
			  'compose-gstring-for-graphic)))

Patches are welcome to convert the emoji-related files in Unicode's
character database into appropriate composition-function-table setup,
similar to the example above.  Some script to be run at Emacs build
time and produce, say, lisp/emoji.el to populate
composition-function-table, would be nice (see the Awk scripts in
admin/unidata as one source of inspiration).

> Also, when loading
> 
> http://www.unicode.org/Public/emoji/12.0/emoji-sequences.txt
> 
> into Emacs, one can see that the Flag sequences and skin colour
> sequences don’t render correctly either (not a font problem, both
> “Noto Color Emoji” and “Joypixels” support these):

If you mean they are not displayed in correct colors, then Emacs
doesn't yet support color emoji, we lack some infrastructure for
that.  Again, work in that area is welcome, it should be relatively
easy since we now have HarfBuzz support for text shaping.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 07:37:01 GMT) Full text and rfc822 format available.

Message #11 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 08:36:10 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Date: Wed, 26 Feb 2020 15:28:58 +0100
>> 
>> As can be seen in the attached screenshot, some 
>> 
>> 👩‍🦰 U+1F469 U+200D U+1F9B0 woman: red hair
>> 🧑‍🦰 U+1F9D1 U+200D U+1F9B0 person: red hair
>> 
>> don’t render correctly in the screenshot, although they work using the
>> same font (“Joypixels”, version 5.5) elsewhere, e.g. in gedit.
>> 
>> Same result in Emacs when using "Noto Color Emoji", both emoji sequences
>> are rendered as 2 characters each in Emacs
>
> Not 2, 3.  Look more closely, and you will see that the U+200D ZWJ
> character is displayed as a thin (1-pixel) space between the 2 emoji.

Yes.

>> When loading
>> 
>> http://www.unicode.org/Public/emoji/12.0/emoji-zwj-sequences.txt
>> 
>> into Emacs one can see that most sequences don’t render correctly
>> (actually *all* sequences, as far as I can see).
>
> That's just a matter of setting up composition-function-table to
> support these sequences.  For example, try the above again after
> evaluating:
>
>   (set-char-table-range composition-function-table '(#x1F9B0 . #x1F9B3)
> 			(list
> 			 (vector
> 			  "[\U0001F468-\U0001F469]\u200D[\U0001F9B0-\U0001F9B3]"
> 			  2
> 			  'compose-gstring-for-graphic)))

Yes, that does indeed work.

> Patches are welcome to convert the emoji-related files in Unicode's
> character database into appropriate composition-function-table setup,
> similar to the example above.  Some script to be run at Emacs build
> time and produce, say, lisp/emoji.el to populate
> composition-function-table, would be nice (see the Awk scripts in
> admin/unidata as one source of inspiration).

Pango also has a .c file which is generated by a python script from
the Unicode emoji data files to make all these sequences known to Pango.

I can try to write a script. Would it be OK to use Python for such a
script generating emoji.el?

>> Also, when loading
>> 
>> http://www.unicode.org/Public/emoji/12.0/emoji-sequences.txt
>> 
>> into Emacs, one can see that the Flag sequences and skin colour
>> sequences don’t render correctly either (not a font problem, both
>> “Noto Color Emoji” and “Joypixels” support these):
>
> If you mean they are not displayed in correct colors, then Emacs
> doesn't yet support color emoji, we lack some infrastructure for
> that.  Again, work in that area is welcome, it should be relatively
> easy since we now have HarfBuzz support for text shaping.

Actually the color display works already. I tested with current master
(build with cairo) and the emoji display just fine in color.

-- 
Mike FABIAN <mfabian <at> redhat.com>





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 08:26:01 GMT) Full text and rfc822 format available.

Message #14 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 10:25:22 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: 39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 08:36:10 +0100
> 
> > Patches are welcome to convert the emoji-related files in Unicode's
> > character database into appropriate composition-function-table setup,
> > similar to the example above.  Some script to be run at Emacs build
> > time and produce, say, lisp/emoji.el to populate
> > composition-function-table, would be nice (see the Awk scripts in
> > admin/unidata as one source of inspiration).
> 
> Pango also has a .c file which is generated by a python script from
> the Unicode emoji data files to make all these sequences known to Pango.
> 
> I can try to write a script. Would it be OK to use Python for such a
> script generating emoji.el?

I'd prefer not to add Python as prerequisite for building Emacs.  We
already use Awk, so using that'd be fine.

Alternatively, we could do it in Emacs Lisp, similar to
unidata-gen.el, but that requires some care because we cannot run Lisp
programs until we have some version of Emacs.

> > If you mean they are not displayed in correct colors, then Emacs
> > doesn't yet support color emoji, we lack some infrastructure for
> > that.  Again, work in that area is welcome, it should be relatively
> > easy since we now have HarfBuzz support for text shaping.
> 
> Actually the color display works already. I tested with current master
> (build with cairo) and the emoji display just fine in color.

Maybe in a Cairo build.  Or maybe I'm missing something.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 12:23:01 GMT) Full text and rfc822 format available.

Message #17 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, Mike FABIAN <mfabian <at> redhat.com>
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 13:21:59 +0100
[Message part 1 (text/plain, inline)]
>>>>> On Fri, 28 Feb 2020 10:25:22 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Mike FABIAN <mfabian <at> redhat.com>
    >> Cc: 39799 <at> debbugs.gnu.org
    >> Date: Fri, 28 Feb 2020 08:36:10 +0100
    >> 
    >> > Patches are welcome to convert the emoji-related files in Unicode's
    >> > character database into appropriate composition-function-table setup,
    >> > similar to the example above.  Some script to be run at Emacs build
    >> > time and produce, say, lisp/emoji.el to populate
    >> > composition-function-table, would be nice (see the Awk scripts in
    >> > admin/unidata as one source of inspiration).
    >> 
    >> Pango also has a .c file which is generated by a python script from
    >> the Unicode emoji data files to make all these sequences known to Pango.
    >> 
    >> I can try to write a script. Would it be OK to use Python for such a
    >> script generating emoji.el?

    Eli> I'd prefer not to add Python as prerequisite for building Emacs.  We
    Eli> already use Awk, so using that'd be fine.

I suck at awk, but my attempt is attached. It DTRT for me under Cairo
if I change my fontset settings to use 'Noto Color Emoji' instead of
Symbola for:

             (#x1F300 . #x1F5FF)	;; Misc Symbols and Pictographs
             (#x1F900 . #x1F9FF)	;; Supplemental Symbols and Pictographs

It matches forward off the first char, so the
composition-function-table entries all have '0' as the number of chars
to match. Would it be better to match backwards? Weʼd run into the
4-character maximum for that, since some of the sequences are 7 or
more characters long.

    >> > If you mean they are not displayed in correct colors, then Emacs
    >> > doesn't yet support color emoji, we lack some infrastructure for
    >> > that.  Again, work in that area is welcome, it should be relatively
    >> > easy since we now have HarfBuzz support for text shaping.
    >> 
    >> Actually the color display works already. I tested with current master
    >> (build with cairo) and the emoji display just fine in color.

    Eli> Maybe in a Cairo build.  Or maybe I'm missing something.

Iʼm not seeing colour emoji in a -Q Cairo build. Which sequence is this
again?

Robert

[emoji-zwj.awk (text/plain, inline)]
#!/usr/bin/awk -f

## Copyright (C) 2020 Free Software Foundation, Inc.

## Author: Robert Pluim <rpluim <at> gmail.com>

## This file is part of GNU Emacs.

## GNU Emacs is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.

## GNU Emacs is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.

## You should have received a copy of the GNU General Public License
## along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.

### Commentary:

## This script takes as input Unicode's emoji-zwj-sequences.txt
## (https://www.unicode.org/Public/emoji/12.0/emoji-zwj-sequences.txt)
## and produces output for Emacs's lisp/international/emoji-zwj.el.

## For additional details, see <https://debbugs.gnu.org/39799#8>.

## Things to do after installing a new version of emoji-zwj-sequences.txt:
## Check the output against the old output.

### Code:

/^[0-9A-F]/ {
    sub(/  *;.*/, "", $0)
    num = split($0, elts)
    if (ch[elts[1]] == "")
    {
        vec[elts[1]] = ""
        ch[elts[1]] = elts[1]
    }
    else
    {
        vec[elts[1]] = vec[elts[1]] " "
    }
        vec[elts[1]] = vec[elts[1]] "\""
    for (j = 1; j <= num; j++)
    {
        c = sprintf("\\N{U+%s}", elts[j])
        vec[elts[1]] = vec[elts[1]] c
    }
    vec[elts[1]] = vec[elts[1]] "\""
}

END {
    print ";;; emoji-zwj.el --- emoji zwj character composition table"
    print ";;; Automatically generated from admin/unidata/emoji-zwj-sequences.txt"
    print "(dolist (elt '("

    for (elt in ch)
    {
        printf("(#x%s . (%s))\n", elt, vec[elt])
}
    print "    ))"
    print "  (set-char-table-range composition-function-table"
    print "                        (car elt)"
    print "                        (list (vector (regexp-opt (cdr elt))"
    print "                                      0"
    print "                                      'compose-gstring-for-graphic))))"
    print "\n"
    print "(provide 'emoji-zwj)"
}
[emoji-zwj.el (application/emacs-lisp, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 12:48:02 GMT) Full text and rfc822 format available.

Message #20 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 13:46:50 +0100
[Message part 1 (text/plain, inline)]
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

> I suck at awk, but my attempt is attached. It DTRT for me under Cairo
> if I change my fontset settings to use 'Noto Color Emoji' instead of
> Symbola for:
>
>              (#x1F300 . #x1F5FF)	;; Misc Symbols and Pictographs
>              (#x1F900 . #x1F9FF)	;; Supplemental Symbols and Pictographs
>
> It matches forward off the first char, so the
> composition-function-table entries all have '0' as the number of chars
> to match. Would it be better to match backwards? Weʼd run into the
> 4-character maximum for that, since some of the sequences are 7 or
> more characters long.
>
>     >> > If you mean they are not displayed in correct colors, then Emacs
>     >> > doesn't yet support color emoji, we lack some infrastructure for
>     >> > that.  Again, work in that area is welcome, it should be relatively
>     >> > easy since we now have HarfBuzz support for text shaping.
>     >> 
>     >> Actually the color display works already. I tested with current master
>     >> (build with cairo) and the emoji display just fine in color.
>
>     Eli> Maybe in a Cairo build.  Or maybe I'm missing something.
>
> Iʼm not seeing colour emoji in a -Q Cairo build. Which sequence is this
> again?

To check the colour, almost any emoji will work, it doesn’t have to be a
sequence. For example, I see these in colour:

👩‍🦰 U+1F469 U+200D U+1F9B0 woman: red hair
🧑‍🦰 U+1F9D1 U+200D U+1F9B0 person: red hair
😇 U+1F607

When I start "emacs -Q" (cairo build from current git master), I
see the emoji first in black and white as in the attached
emacs-default-emoji.png.

Then, after evaluating:

(set-fontset-font t '(#x10000 . #x1FFFF) '("Noto Color Emoji" . "unicode-bmp") nil 'prepend)

I see them in colour.

So I have put

(set-fontset-font t '(#x10000 . #x1FFFF) '("Noto Color Emoji" . "unicode-bmp") nil 'prepend)

in my init file.

-- 
Mike FABIAN <mfabian <at> redhat.com>

[emacs-default-emoji.png (image/png, attachment)]
[emacs-color-emoji.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 13:10:02 GMT) Full text and rfc822 format available.

Message #23 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>, Glenn Morris <rgm <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 15:08:59 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Mike FABIAN <mfabian <at> redhat.com>,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 13:21:59 +0100
> 
>     Eli> I'd prefer not to add Python as prerequisite for building Emacs.  We
>     Eli> already use Awk, so using that'd be fine.
> 
> I suck at awk, but my attempt is attached.

Thanks.  I wonder if we could make the output more human-readable...
Glenn, any advice or comments?

> It DTRT for me under Cairo if I change my fontset settings to use
> 'Noto Color Emoji' instead of Symbola for:

Is that a free font (it's from Google, AFAIK, so it might not be)?  If
it is free, we could modify fontset.el to use this font if available.
(Or maybe there are better free Emoji fonts out there?)

>              (#x1F300 . #x1F5FF)	;; Misc Symbols and Pictographs
>              (#x1F900 . #x1F9FF)	;; Supplemental Symbols and Pictographs
> 
> It matches forward off the first char, so the
> composition-function-table entries all have '0' as the number of chars
> to match. Would it be better to match backwards?

I don't think matching backwards is better in general.  Did you have a
reason for thinking it was?

> Weʼd run into the 4-character maximum for that, since some of the
> sequences are 7 or more characters long.

If the sequences are 7 character long, then the forward-matching
pattern will hit the same limitation as well, no?

>     >> > If you mean they are not displayed in correct colors, then Emacs
>     >> > doesn't yet support color emoji, we lack some infrastructure for
>     >> > that.  Again, work in that area is welcome, it should be relatively
>     >> > easy since we now have HarfBuzz support for text shaping.
>     >> 
>     >> Actually the color display works already. I tested with current master
>     >> (build with cairo) and the emoji display just fine in color.
> 
>     Eli> Maybe in a Cairo build.  Or maybe I'm missing something.
> 
> Iʼm not seeing colour emoji in a -Q Cairo build. Which sequence is this
> again?

The ones in http://www.unicode.org/Public/emoji/12.0/emoji-sequences.txt,
and specifically the flag sequences and the skin color sequences.  At
least AFAIU the original report.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 13:21:02 GMT) Full text and rfc822 format available.

Message #26 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: 39799 <at> debbugs.gnu.org, eliz <at> gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 14:19:57 +0100
>>>>> On Fri, 28 Feb 2020 13:46:50 +0100, Mike FABIAN <mfabian <at> redhat.com> said:
    >> Iʼm not seeing colour emoji in a -Q Cairo build. Which sequence is this
    >> again?

    Mike> To check the colour, almost any emoji will work, it doesn’t have to be a
    Mike> sequence. For example, I see these in colour:

    Mike> 👩‍🦰 U+1F469 U+200D U+1F9B0 woman: red hair
    Mike> 🧑‍🦰 U+1F9D1 U+200D U+1F9B0 person: red hair
    Mike> 😇 U+1F607

    Mike> When I start "emacs -Q" (cairo build from current git master), I
    Mike> see the emoji first in black and white as in the attached
    Mike> emacs-default-emoji.png.

    Mike> Then, after evaluating:

    Mike> (set-fontset-font t '(#x10000 . #x1FFFF) '("Noto Color Emoji" . "unicode-bmp") nil 'prepend)

    Mike> I see them in colour.

    Mike> So I have put

    Mike> (set-fontset-font t '(#x10000 . #x1FFFF) '("Noto Color Emoji" . "unicode-bmp") nil 'prepend)

    Mike> in my init file.

OK, so you were changing the fontsets. That matches what I see.

Hmm, is "Symbola" still a good fallback font? Or should we add "Noto
Color Emoji" or similar in front of it?

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 13:48:02 GMT) Full text and rfc822 format available.

Message #29 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Glenn Morris <rgm <at> gnu.org>, Robert Pluim <rpluim <at> gmail.com>,
 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 14:47:40 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> It DTRT for me under Cairo if I change my fontset settings to use
>> 'Noto Color Emoji' instead of Symbola for:
>
> Is that a free font (it's from Google, AFAIK, so it might not be)?  If
> it is free, we could modify fontset.el to use this font if available.
> (Or maybe there are better free Emoji fonts out there?)

“Noto Color Emoji” is free (Apache 2.0 License):
https://github.com/googlefonts/noto-emoji/blob/master/LICENSE

“Joypixels” is also a nice colour emoji font, but it is *not* free:

https://d1j8pt39hxlh3d.cloudfront.net/contracts/finalized-pdfs/free-5.1.pdf

(free only for personal use).

The nice black and white emoji font “Symbola” is unfortunately not free
either, see:

http://users.teilar.gr/~g1951d/License.pdf

free for “strictly personal and non-commercial purposes”.

-- 
Mike FABIAN <mfabian <at> redhat.com>





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 13:51:02 GMT) Full text and rfc822 format available.

Message #32 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, eliz <at> gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 14:50:48 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

> Hmm, is "Symbola" still a good fallback font? Or should we add "Noto
> Color Emoji" or similar in front of it?

I think "Noto Color Emoji" is nicer than "Symbola" if available.
The emoji look much nicer in colour, much easier to distinguish.

I think Symbola is a good black and white fallback if no colour emoji
font is available.

Unfortunately the license of Symbola is not free anymore, it used to be
but it was recently changed to “strictly personal and non-commercial
purposes“:

http://users.teilar.gr/~g1951d/License.pdf

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 13:56:01 GMT) Full text and rfc822 format available.

Message #35 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 15:54:53 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Robert Pluim <rpluim <at> gmail.com>,  Glenn Morris <rgm <at> gnu.org>,
>   39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 14:47:40 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> さんはかきました:
> 
> >> It DTRT for me under Cairo if I change my fontset settings to use
> >> 'Noto Color Emoji' instead of Symbola for:
> >
> > Is that a free font (it's from Google, AFAIK, so it might not be)?  If
> > it is free, we could modify fontset.el to use this font if available.
> > (Or maybe there are better free Emoji fonts out there?)
> 
> “Noto Color Emoji” is free (Apache 2.0 License):
> https://github.com/googlefonts/noto-emoji/blob/master/LICENSE
> 
> “Joypixels” is also a nice colour emoji font, but it is *not* free:
> 
> https://d1j8pt39hxlh3d.cloudfront.net/contracts/finalized-pdfs/free-5.1.pdf
> 
> (free only for personal use).

Thanks for the info.

> The nice black and white emoji font “Symbola” is unfortunately not free
> either, see:
> 
> http://users.teilar.gr/~g1951d/License.pdf
> 
> free for “strictly personal and non-commercial purposes”.

That's the latest version, AFAIK; older versions were free, and can
still be found on the Internet.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 13:57:01 GMT) Full text and rfc822 format available.

Message #38 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 15:56:40 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: 39799 <at> debbugs.gnu.org,  eliz <at> gnu.org
> Date: Fri, 28 Feb 2020 14:50:48 +0100
> 
> Robert Pluim <rpluim <at> gmail.com> さんはかきました:
> 
> > Hmm, is "Symbola" still a good fallback font? Or should we add "Noto
> > Color Emoji" or similar in front of it?
> 
> I think "Noto Color Emoji" is nicer than "Symbola" if available.
> The emoji look much nicer in colour, much easier to distinguish.

Symbola covers much more than just Emoji.  That was the main reason it
was added to fontset.el.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 14:15:01 GMT) Full text and rfc822 format available.

Message #41 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Glenn Morris <rgm <at> gnu.org>, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 15:14:01 +0100
>>>>> On Fri, 28 Feb 2020 15:08:59 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Mike FABIAN <mfabian <at> redhat.com>,  39799 <at> debbugs.gnu.org
    >> Date: Fri, 28 Feb 2020 13:21:59 +0100
    >> 
    Eli> I'd prefer not to add Python as prerequisite for building Emacs.  We
    Eli> already use Awk, so using that'd be fine.
    >> 
    >> I suck at awk, but my attempt is attached.

    Eli> Thanks.  I wonder if we could make the output more human-readable...
    Eli> Glenn, any advice or comments?

Why does it need to be human-readable? The other files generated from
the unicode data are not particularly readable.

    >> It DTRT for me under Cairo if I change my fontset settings to use
    >> 'Noto Color Emoji' instead of Symbola for:

    Eli> Is that a free font (it's from Google, AFAIK, so it might not be)?  If
    Eli> it is free, we could modify fontset.el to use this font if available.
    Eli> (Or maybe there are better free Emoji fonts out there?)

Its license is Apache 2.0. It seems fairly popular. I have no opinion
either way.

    >> (#x1F300 . #x1F5FF)	;; Misc Symbols and Pictographs
    >> (#x1F900 . #x1F9FF)	;; Supplemental Symbols and Pictographs
    >> 
    >> It matches forward off the first char, so the
    >> composition-function-table entries all have '0' as the number of chars
    >> to match. Would it be better to match backwards?

    Eli> I don't think matching backwards is better in general.  Did you have a
    Eli> reason for thinking it was?

I thought I saw a comment in composite.c that says matching is done
backward, but I see that itʼs done forwards as well.

    >> Weʼd run into the 4-character maximum for that, since some of the
    >> sequences are 7 or more characters long.

    Eli> If the sequences are 7 character long, then the forward-matching
    Eli> pattern will hit the same limitation as well, no?

C-h v composition-function-table says:

    PREV-CHARS is a non-negative integer (less than 4) specifying how many
    characters before C to check the matching with PATTERN.  If it is 0,
    PATTERN must match C and the following characters.  If it is 1,
    PATTERN must match a character before C and the following characters.

which on careful re-reading says that the lookback canʼt be more than
3 characters, but that matching forward has no limit.

    Eli> The ones in http://www.unicode.org/Public/emoji/12.0/emoji-sequences.txt,
    Eli> and specifically the flag sequences and the skin color sequences.  At
    Eli> least AFAIU the original report.

As Mike clarified, you need to change the fontsets in order to get
them to display in colour (uncomposed, of course).

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 14:45:02 GMT) Full text and rfc822 format available.

Message #44 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 15:44:04 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: 39799 <at> debbugs.gnu.org,  eliz <at> gnu.org
>> Date: Fri, 28 Feb 2020 14:50:48 +0100
>> 
>> Robert Pluim <rpluim <at> gmail.com> さんはかきました:
>> 
>> > Hmm, is "Symbola" still a good fallback font? Or should we add "Noto
>> > Color Emoji" or similar in front of it?
>> 
>> I think "Noto Color Emoji" is nicer than "Symbola" if available.
>> The emoji look much nicer in colour, much easier to distinguish.
>
> Symbola covers much more than just Emoji.  That was the main reason it
> was added to fontset.el.

Yes, and I think it should be kept there for all the other symbols.
And also as a fallback for emoji if no nicer colour emoji font is
installed.

Even if the current license is only for personal use, I think it is
still good to have Symbola in fontset.el to be able to use it just by
installing the font.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 14:46:01 GMT) Full text and rfc822 format available.

Message #47 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 16:45:04 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 15:14:01 +0100
> 
>     >> It DTRT for me under Cairo if I change my fontset settings to use
>     >> 'Noto Color Emoji' instead of Symbola for:
> 
>     Eli> Is that a free font (it's from Google, AFAIK, so it might not be)?  If
>     Eli> it is free, we could modify fontset.el to use this font if available.
>     Eli> (Or maybe there are better free Emoji fonts out there?)
> 
> Its license is Apache 2.0. It seems fairly popular. I have no opinion
> either way.

What about the fact that we still support XFT?

And anyway, the name "Noto Color Emoji" seems to imply it's a font
created to display Emoji, not symbols in general, let alone non-symbol
blocks we currently set up to use Symbola if that is available.

>     >> Weʼd run into the 4-character maximum for that, since some of the
>     >> sequences are 7 or more characters long.
> 
>     Eli> If the sequences are 7 character long, then the forward-matching
>     Eli> pattern will hit the same limitation as well, no?
> 
> C-h v composition-function-table says:
> 
>     PREV-CHARS is a non-negative integer (less than 4) specifying how many
>     characters before C to check the matching with PATTERN.  If it is 0,
>     PATTERN must match C and the following characters.  If it is 1,
>     PATTERN must match a character before C and the following characters.
> 
> which on careful re-reading says that the lookback canʼt be more than
> 3 characters, but that matching forward has no limit.

Depends on the patterns used, I guess.

>     Eli> The ones in http://www.unicode.org/Public/emoji/12.0/emoji-sequences.txt,
>     Eli> and specifically the flag sequences and the skin color sequences.  At
>     Eli> least AFAIU the original report.
> 
> As Mike clarified, you need to change the fontsets in order to get
> them to display in colour (uncomposed, of course).

I don't see how that is relevant.  Fontsets are just means to cause
Emacs use a certain font for a certain range of characters.  Fontsets
do not affect color Emoji support.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 14:47:01 GMT) Full text and rfc822 format available.

Message #50 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 16:46:30 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 15:14:01 +0100
> 
>     Eli> Thanks.  I wonder if we could make the output more human-readable...
>     Eli> Glenn, any advice or comments?
> 
> Why does it need to be human-readable? The other files generated from
> the unicode data are not particularly readable.

Readability is desirable because the file will be read by humans.

Which other files are not readable?  I had charscript.el in mind, and
that one is quite readable.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 15:33:01 GMT) Full text and rfc822 format available.

Message #53 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, Robert Pluim <rpluim <at> gmail.com>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 16:32:39 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
>> Date: Fri, 28 Feb 2020 15:14:01 +0100
>> 
>>     >> It DTRT for me under Cairo if I change my fontset settings to use
>>     >> 'Noto Color Emoji' instead of Symbola for:
>> 
>>     Eli> Is that a free font (it's from Google, AFAIK, so it might not be)?  If
>>     Eli> it is free, we could modify fontset.el to use this font if available.
>>     Eli> (Or maybe there are better free Emoji fonts out there?)
>> 
>> Its license is Apache 2.0. It seems fairly popular. I have no opinion
>> either way.
>
> What about the fact that we still support XFT?

Is it possible to set up the fontsets by default in a way that colour
emoji fonts like "Noto Color Emoji" can be used by default in a cairo
build but avoided by default in an XFT build?

> And anyway, the name "Noto Color Emoji" seems to imply it's a font
> created to display Emoji, not symbols in general, let alone non-symbol
> blocks we currently set up to use Symbola if that is available.

Yes, if possible, "Noto Color Emoji" should be preferred for the emoji
but Symbola should be preferred for all the other symbols.

>> As Mike clarified, you need to change the fontsets in order to get
>> them to display in colour (uncomposed, of course).
>
> I don't see how that is relevant.  Fontsets are just means to cause
> Emacs use a certain font for a certain range of characters.  Fontsets
> do not affect color Emoji support.

Yes, so if you change the fontset to use a colour emoji font for a
certain range of characters (which should be emoji), these emoji will
display in colour in a cairo build.

I am not sure what happens in an XFT build, if possible such unsupported
fonts should be ignored in an XFT build.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 15:36:02 GMT) Full text and rfc822 format available.

Message #56 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 16:35:17 +0100
>>>>> On Fri, 28 Feb 2020 16:46:30 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
    >> Date: Fri, 28 Feb 2020 15:14:01 +0100
    >> 
    Eli> Thanks.  I wonder if we could make the output more human-readable...
    Eli> Glenn, any advice or comments?
    >> 
    >> Why does it need to be human-readable? The other files generated from
    >> the unicode data are not particularly readable.

    Eli> Readability is desirable because the file will be read by humans.

Hmm, maybe. I guess we could process it in elisp to replace the
characters with their names, and adding extra newlines is
trivial. What other kind of changes did you have in mind?

    Eli> Which other files are not readable?  I had charscript.el in mind, and
    Eli> that one is quite readable.

I had uni-bidi.el in mind, and thatʼs just a dump of a char-table.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 15:40:02 GMT) Full text and rfc822 format available.

Message #59 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 16:39:04 +0100
>>>>> On Fri, 28 Feb 2020 16:45:04 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
    >> Date: Fri, 28 Feb 2020 15:14:01 +0100
    >> 
    >> >> It DTRT for me under Cairo if I change my fontset settings to use
    >> >> 'Noto Color Emoji' instead of Symbola for:
    >> 
    Eli> Is that a free font (it's from Google, AFAIK, so it might not be)?  If
    Eli> it is free, we could modify fontset.el to use this font if available.
    Eli> (Or maybe there are better free Emoji fonts out there?)
    >> 
    >> Its license is Apache 2.0. It seems fairly popular. I have no opinion
    >> either way.

    Eli> What about the fact that we still support XFT?

I try to forget that :-)

    Eli> And anyway, the name "Noto Color Emoji" seems to imply it's a font
    Eli> created to display Emoji, not symbols in general, let alone non-symbol
    Eli> blocks we currently set up to use Symbola if that is available.

Right. Thereʼs a Noto Emoji font as well.

    >> As Mike clarified, you need to change the fontsets in order to get
    >> them to display in colour (uncomposed, of course).

    Eli> I don't see how that is relevant.  Fontsets are just means to cause
    Eli> Emacs use a certain font for a certain range of characters.  Fontsets
    Eli> do not affect color Emoji support.

They donʼt, no, but in this case changing the fontset was enough to
get the right glyphs to display.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 15:45:02 GMT) Full text and rfc822 format available.

Message #62 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 17:44:08 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Fri, 28 Feb 2020 16:35:17 +0100
> 
>     Eli> Readability is desirable because the file will be read by humans.
> 
> Hmm, maybe. I guess we could process it in elisp to replace the
> characters with their names, and adding extra newlines is
> trivial. What other kind of changes did you have in mind?

Just adding newlines, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 15:58:02 GMT) Full text and rfc822 format available.

Message #65 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 16:57:12 +0100
>>>>> On Fri, 28 Feb 2020 16:32:39 +0100, Mike FABIAN <mfabian <at> redhat.com> said:

    Mike> Eli Zaretskii <eliz <at> gnu.org> さんはかきました:
    >>> From: Robert Pluim <rpluim <at> gmail.com>
    >>> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
    >>> Date: Fri, 28 Feb 2020 15:14:01 +0100
    >>> 
    >>> >> It DTRT for me under Cairo if I change my fontset settings to use
    >>> >> 'Noto Color Emoji' instead of Symbola for:
    >>> 
    Eli> Is that a free font (it's from Google, AFAIK, so it might not be)?  If
    Eli> it is free, we could modify fontset.el to use this font if available.
    Eli> (Or maybe there are better free Emoji fonts out there?)
    >>> 
    >>> Its license is Apache 2.0. It seems fairly popular. I have no opinion
    >>> either way.
    >> 
    >> What about the fact that we still support XFT?

    Mike> Is it possible to set up the fontsets by default in a way that colour
    Mike> emoji fonts like "Noto Color Emoji" can be used by default in a cairo
    Mike> build but avoided by default in an XFT build?

Iʼm not sure. I donʼt think we have a (featurep 'xft) or similar, and
parsing system-configuration-features is just icky.

Itʼs possible that adding Noto Color Emoji to a fontset will just
result in it being ignored in an XFT build. Itʼs not something Iʼve
tested.

    Mike> Yes, so if you change the fontset to use a colour emoji font for a
    Mike> certain range of characters (which should be emoji), these emoji will
    Mike> display in colour in a cairo build.

    Mike> I am not sure what happens in an XFT build, if possible such unsupported
    Mike> fonts should be ignored in an XFT build.

Colour fonts are ignored in an XFT build, period. Fonts that are
colour fonts but donʼt get classified as such by fontconfig (such as
"Noto Color Emoji") get added to face-ignored-fonts as and when we
discover them.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 16:20:02 GMT) Full text and rfc822 format available.

Message #68 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 18:19:10 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 15:14:01 +0100
> 
>     >> It matches forward off the first char, so the
>     >> composition-function-table entries all have '0' as the number of chars
>     >> to match. Would it be better to match backwards?
> 
>     Eli> I don't think matching backwards is better in general.  Did you have a
>     Eli> reason for thinking it was?
> 
> I thought I saw a comment in composite.c that says matching is done
> backward, but I see that itʼs done forwards as well.

Btw, it sometimes _can_ be beneficial to use backward matching: if it
makes the size of composition-function-table smaller.  Since
composition-function-table is a char-table, and char-tables allocate
sub-tables only if needed, you can conserve memory (and thus make
Emacs's memory footprint smaller) and faster (because 'aref' will llok
up values in a char-table faster) by setting a smaller number of
slots.  For example, if the 2nd character of an Emoji sequence was
always one specific character, or a small set of characters, you could
set only the slots of those few characters, which would make the
char-table smaller.  OTOH, if that would yield many different
composition rules in the list of rules for those few characters,
redisplay could become slower, because it generally examines the rules
one by one until it finds an appropriate one.  So the winning setup of
composition-function-table is the one that sets the smallest number of
slots, but still keeps the lists of rules for those slots short.  And
note that setting the same rule for a range of codepoints generally
uses up only one slot in the char-table, so rules that can be
generalized to cover many characters are preferable.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 16:26:02 GMT) Full text and rfc822 format available.

Message #71 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 17:24:58 +0100
>>>>> On Fri, 28 Feb 2020 17:44:08 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
    >> Date: Fri, 28 Feb 2020 16:35:17 +0100
    >> 
    Eli> Readability is desirable because the file will be read by humans.
    >> 
    >> Hmm, maybe. I guess we could process it in elisp to replace the
    >> characters with their names, and adding extra newlines is
    >> trivial. What other kind of changes did you have in mind?

    Eli> Just adding newlines, I think.

OK. Iʼll work on that and the required makefile changes.

One thing this has thrown up that I donʼt understand is this:

Most of the emojis in emoji-sequences.txt can be made to use Noto
Color Emoji, but some canʼt. e.g.

#x24c2 Ⓜ

is stubbornly not being displayed using Noto Color Emoji, even though
that font has a glyph for it, and Iʼve added:

     (set-fontset-font "fontset-default" symbol-subgroup
                      '("Noto Color Emoji" . "iso10646-1") nil
                      'prepend)

just after the similar setting for Symbola in
lisp/international/fontset.el

Itʼs not being displayed with the default font, and setting
use-default-font-for-symbols to nil makes no difference. Itʼs using:

    ftcrhb:-GOOG-Noto Sans CJK JP-normal-normal-normal-*-16-*-*-*-*-0-iso10646-1 (#x3F8)

However, if I
eval

     (set-fontset-font nil #x24c2
                      '("Noto Color Emoji" . "iso10646-1") nil
                      'prepend)

in the frame displaying the character, then it does use Noto Color
Emoji. What am I missing?

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 16:39:02 GMT) Full text and rfc822 format available.

Message #74 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 17:38:34 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

>     Eli> And anyway, the name "Noto Color Emoji" seems to imply it's a font
>     Eli> created to display Emoji, not symbols in general, let alone non-symbol
>     Eli> blocks we currently set up to use Symbola if that is available.
>
> Right. Thereʼs a Noto Emoji font as well.

That has not been updated for a very long time though. Maybe it is dead.

-- 
Mike FABIAN <mfabian <at> redhat.com>





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 16:41:02 GMT) Full text and rfc822 format available.

Message #77 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 17:39:56 +0100
>>>>> On Fri, 28 Feb 2020 18:19:10 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Glenn Morris <rgm <at> gnu.org>,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
    >> Date: Fri, 28 Feb 2020 15:14:01 +0100
    >> 
    >> >> It matches forward off the first char, so the
    >> >> composition-function-table entries all have '0' as the number of chars
    >> >> to match. Would it be better to match backwards?
    >> 
    Eli> I don't think matching backwards is better in general.  Did you have a
    Eli> reason for thinking it was?
    >> 
    >> I thought I saw a comment in composite.c that says matching is done
    >> backward, but I see that itʼs done forwards as well.

    Eli> Btw, it sometimes _can_ be beneficial to use backward matching: if it
    Eli> makes the size of composition-function-table smaller.  Since
    Eli> composition-function-table is a char-table, and char-tables allocate
    Eli> sub-tables only if needed, you can conserve memory (and thus make
    Eli> Emacs's memory footprint smaller) and faster (because 'aref' will llok
    Eli> up values in a char-table faster) by setting a smaller number of
    Eli> slots.  For example, if the 2nd character of an Emoji sequence was
    Eli> always one specific character, or a small set of characters, you could
    Eli> set only the slots of those few characters, which would make the
    Eli> char-table smaller.  OTOH, if that would yield many different
    Eli> composition rules in the list of rules for those few characters,
    Eli> redisplay could become slower, because it generally examines the rules
    Eli> one by one until it finds an appropriate one.  So the winning setup of
    Eli> composition-function-table is the one that sets the smallest number of
    Eli> slots, but still keeps the lists of rules for those slots short.  And
    Eli> note that setting the same rule for a range of codepoints generally
    Eli> uses up only one slot in the char-table, so rules that can be
    Eli> generalized to cover many characters are preferable.

I donʼt think that applies in this case. The sequences are all easily
categorised based on the first char in the sequence. It could be done
based on the 2nd, or 3rd or whatever, but I donʼt think that reduces
the number of entries. Plus thereʼs always one rule per character,
since multiple patterns starting with the same character are combined
using regexp-opt.

One thing though: the code currently does set-char-table-range to a
new value. Is there a chance that an entry already exists in
composition-function-table for a particular character? If so Iʼd have
to change it to add the new rule after the existing one (before?).

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 17:31:02 GMT) Full text and rfc822 format available.

Message #80 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 18:30:12 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

> One thing this has thrown up that I donʼt understand is this:
>
> Most of the emojis in emoji-sequences.txt can be made to use Noto
> Color Emoji, but some canʼt. e.g.
>
> #x24c2 Ⓜ
>
> is stubbornly not being displayed using Noto Color Emoji, even though
> that font has a glyph for it, and Iʼve added:
>
>      (set-fontset-font "fontset-default" symbol-subgroup
>                       '("Noto Color Emoji" . "iso10646-1") nil
>                       'prepend)
>
> just after the similar setting for Symbola in
> lisp/international/fontset.el
>
> Itʼs not being displayed with the default font, and setting
> use-default-font-for-symbols to nil makes no difference. Itʼs using:
>
>     ftcrhb:-GOOG-Noto Sans CJK JP-normal-normal-normal-*-16-*-*-*-*-0-iso10646-1 (#x3F8)
>
> However, if I
> eval
>
>      (set-fontset-font nil #x24c2
>                       '("Noto Color Emoji" . "iso10646-1") nil
>                       'prepend)
>
> in the frame displaying the character, then it does use Noto Color
> Emoji. What am I missing?
>
> Robert

U+24C2 is an Emoji which has both a text and an emoji presentation. See:

http://unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes
http://unicode.org/reports/tr51/#def_fully_qualified_emoji_zwj_sequence
http://unicode.org/reports/tr51/#def_non_fully_qualified_emoji_zwj_sequence

http://www.unicode.org/Public/emoji/12.0/emoji-data.txt

U+1F600 is an emoji, which has only emoji representation:

$ grep 1F600 emoji-data.txt 
1F600         ; Emoji                # E1.0   [1] (😀)       grinning face
1F600         ; Emoji_Presentation   # E1.0   [1] (😀)       grinning face
1F600         ; Extended_Pictographic# E1.0   [1] (😀)       grinning face

It displays without problems in colour in my Emacs.

Note that U+24C2 does not have the "Emoji_Presentation" tag:

$ grep 24C2 emoji-data.txt 
24C2          ; Emoji                # E0.6   [1] (Ⓜ️)       circled M
24C2          ; Extended_Pictographic# E0.6   [1] (Ⓜ️)       circled M

It has to variations, text representation and emoji representation:

$ grep 24C2 emoji-variation-sequences.txt 
24C2 FE0E  ; text style;  # (1.1) CIRCLED LATIN CAPITAL LETTER M
24C2 FE0F  ; emoji style; # (1.1) CIRCLED LATIN CAPITAL LETTER M

(U+1F600 is not in emoji-variation-sequences.txt as it has only emoji representation).

$ grep 1F600 emoji-test.txt 
1F600                                      ; fully-qualified     # 😀 E1.0 grinning face
$ grep 24C2 emoji-test.txt 
24C2 FE0F                                  ; fully-qualified     # Ⓜ️ E0.6 circled M
24C2                                       ; unqualified         # Ⓜ E0.6 circled M
$

As you can see above, U+1F600 is already fully-qualified on its own.

If I test in gedit, U+24C2 on  its  own is displayed in black and white
(happens to use "MS Gothic" font on my system).
U+24C2 U+FE0E is displayed in black and white in gedit as well.
U+24C2 U+FE0F is displayed in colour in gedit  using the "Noto Color
Emoji" font.

These selectors don’t work in Emacs for me. U+24C2, U+24C2 U+FE0E, and
U+24C2 U+FE0F *all* display in black and white for me in Emacs.

The selectors are displayed as a narrow box.

The presence of such selectors in a currently visible buffer make my
Emacs extremely slow and unresponsive, I can hardly finish typing this
e-mail.

If I switch to some other buffer so that no such selectors are currently
visible, my Emacs is responsive.

Now  that I switched back to this buffer to send this e-mail, it is
terribly slow again. 

Same problem when one of the Unicode emoji data files is displayed which
contains these selectors. Emacs  becomes  unusably slow.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 17:56:02 GMT) Full text and rfc822 format available.

Message #83 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 18:55:24 +0100
[Message part 1 (text/plain, inline)]
Mike FABIAN <mfabian <at> redhat.com> さんはかきました:

> U+24C2 is an Emoji which has both a text and an emoji presentation. See:

Surprisingly, even some ASCII characters are emoji and have a text and
an emoji representation. For example # U+0023:

$ grep 0023 emoji-data.txt 
0023          ; Emoji                # E0.0   [1] (#️)       number sign
0023          ; Emoji_Component      # E0.0   [1] (#️)       number sign
$

Is listed as an Emoji but does not have Emoji_Presentation tag, so
it should usually be displayed as text when not followed by a variation
selector.

$ grep 0023 emoji-variation-sequences.txt 
0023 FE0E  ; text style;  # (1.1) NUMBER SIGN
0023 FE0F  ; emoji style; # (1.1) NUMBER SIGN

When testing this in gedit,


U+23 displays as text using the “DejaVu Sans” font.
U+23 U+FE0E displays as text using the “DejaVu Sans” font.
U+23 U+FE0F displays as an emoji using the “Noto Color Emoji” Font. 

With the “Noto Color Emoji” font this is not very obvious as the glyph
for # in that font looks quite similar to the text version. 

My “emoji-picker” tool displays it the same way as gedit as it also uses
pango, see the 3 attached screenshot showing how it looks like in
DejaVu Sans, Noto Color Emoji, and Joypixels.

-- 
Mike FABIAN <mfabian <at> redhat.com>

[hash-shown-as-emoji-with-joypixels-font.png (image/png, attachment)]
[hash-shown-as-emoji-with-noto-color-emoji-font.png (image/png, attachment)]
[hash-shown-as-text-with-dejavu-sans-font.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 18:03:01 GMT) Full text and rfc822 format available.

Message #86 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 19:01:59 +0100
>>>>> On Fri, 28 Feb 2020 18:30:12 +0100, Mike FABIAN <mfabian <at> redhat.com> said:

    >> #x24c2 Ⓜ
    >> 
    >> is stubbornly not being displayed using Noto Color Emoji, even though
    >> that font has a glyph for it, and Iʼve added:

    Mike> U+24C2 is an Emoji which has both a text and an emoji presentation. See:

    Mike> http://unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes
    Mike> http://unicode.org/reports/tr51/#def_fully_qualified_emoji_zwj_sequence
    Mike> http://unicode.org/reports/tr51/#def_non_fully_qualified_emoji_zwj_sequence

    Mike> http://www.unicode.org/Public/emoji/12.0/emoji-data.txt

    Mike> U+1F600 is an emoji, which has only emoji representation:

    Mike> $ grep 1F600 emoji-data.txt 
    Mike> 1F600         ; Emoji                # E1.0   [1] (😀)       grinning face
    Mike> 1F600         ; Emoji_Presentation   # E1.0   [1] (😀)       grinning face
    Mike> 1F600         ; Extended_Pictographic# E1.0   [1] (😀)       grinning face

    Mike> It displays without problems in colour in my Emacs.

    Mike> Note that U+24C2 does not have the "Emoji_Presentation" tag:

    Mike> $ grep 24C2 emoji-data.txt 
    Mike> 24C2          ; Emoji                # E0.6   [1] (Ⓜ️)       circled M
    Mike> 24C2          ; Extended_Pictographic# E0.6   [1] (Ⓜ️)       circled M

    Mike> It has to variations, text representation and emoji representation:

    Mike> $ grep 24C2 emoji-variation-sequences.txt 
    Mike> 24C2 FE0E  ; text style;  # (1.1) CIRCLED LATIN CAPITAL LETTER M
    Mike> 24C2 FE0F  ; emoji style; # (1.1) CIRCLED LATIN CAPITAL LETTER M

    Mike> (U+1F600 is not in emoji-variation-sequences.txt as it has only emoji representation).

    Mike> $ grep 1F600 emoji-test.txt 
    Mike> 1F600                                      ; fully-qualified     # 😀 E1.0 grinning face
    Mike> $ grep 24C2 emoji-test.txt 
    Mike> 24C2 FE0F                                  ; fully-qualified     # Ⓜ️ E0.6 circled M
    Mike> 24C2                                       ; unqualified         # Ⓜ E0.6 circled M
    Mike> $

    Mike> As you can see above, U+1F600 is already fully-qualified on its own.

    Mike> If I test in gedit, U+24C2 on  its  own is displayed in black and white
    Mike> (happens to use "MS Gothic" font on my system).
    Mike> U+24C2 U+FE0E is displayed in black and white in gedit as well.
    Mike> U+24C2 U+FE0F is displayed in colour in gedit  using the "Noto Color
    Mike> Emoji" font.

OK. How do you determine which font is being used in gedit?

    Mike> These selectors don’t work in Emacs for me. U+24C2, U+24C2 U+FE0E, and
    Mike> U+24C2 U+FE0F *all* display in black and white for me in Emacs.

OK, so itʼs not just me. Iʼll have to do some reading and some
digging.

    Mike> The presence of such selectors in a currently visible buffer make my
    Mike> Emacs extremely slow and unresponsive, I can hardly finish typing this
    Mike> e-mail.

    Mike> If I switch to some other buffer so that no such selectors are currently
    Mike> visible, my Emacs is responsive.

    Mike> Now  that I switched back to this buffer to send this e-mail, it is
    Mike> terribly slow again. 

    Mike> Same problem when one of the Unicode emoji data files is displayed which
    Mike> contains these selectors. Emacs  becomes  unusably slow.

Can you try my patch from
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39133#41> ? I probably
should have pushed it already...

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 19:30:02 GMT) Full text and rfc822 format available.

Message #89 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 20:29:03 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

> OK. How do you determine which font is being used in gedit?

By comparing how it looks like in gedit with how it looks like in my own
little emoji-tool “emoji-picker”. “emoji-picker” uses the same rendering
stack as gedit (harfbuzz, cairo, pango).

As you can see in the screenshots of “emoji-picker” attached to my last
mail, when I right click on an emoji I get a popup with some information
about that emoji where I also display which font was actually used to
render that emoji. That might be a different font from what was
requested in the font menu of emoji-picker. A bit similar how you can
check in Emacs with “C-u C-x =” what font was really used for the
character under the cursor.

In “emoji-picker” I can see that parts of an emoji-sequence are
sometimes even rendered in several different fonts if this sequence was
recently added by Unicode and Pango does not know it yet.

https://github.com/mike-fabian/ibus-typing-booster/blob/master/engine/itb_pango.py#L114

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 19:36:02 GMT) Full text and rfc822 format available.

Message #92 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 20:34:44 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

>     Mike> The presence of such selectors in a currently visible buffer make my
>     Mike> Emacs extremely slow and unresponsive, I can hardly finish typing this
>     Mike> e-mail.
>
> Can you try my patch from
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39133#41> ? I probably
> should have pushed it already...

Trying, building with your patch now ...

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:14:02 GMT) Full text and rfc822 format available.

Message #95 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:13:14 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Fri, 28 Feb 2020 17:24:58 +0100
> 
> Most of the emojis in emoji-sequences.txt can be made to use Noto
> Color Emoji, but some canʼt. e.g.
> 
> #x24c2 Ⓜ
> 
> is stubbornly not being displayed using Noto Color Emoji, even though
> that font has a glyph for it, and Iʼve added:
> 
>      (set-fontset-font "fontset-default" symbol-subgroup
>                       '("Noto Color Emoji" . "iso10646-1") nil
>                       'prepend)
> 
> just after the similar setting for Symbola in
> lisp/international/fontset.el
> 
> Itʼs not being displayed with the default font, and setting
> use-default-font-for-symbols to nil makes no difference. Itʼs using:
> 
>     ftcrhb:-GOOG-Noto Sans CJK JP-normal-normal-normal-*-16-*-*-*-*-0-iso10646-1 (#x3F8)
> 
> However, if I
> eval
> 
>      (set-fontset-font nil #x24c2
>                       '("Noto Color Emoji" . "iso10646-1") nil
>                       'prepend)
> 
> in the frame displaying the character, then it does use Noto Color
> Emoji. What am I missing?

Which part makes the difference: the "fontset-default" vs nil or
symbol-subgroup vs an explicit codepoint?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:17:01 GMT) Full text and rfc822 format available.

Message #98 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:16:21 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 17:39:56 +0100
> 
> I donʼt think that applies in this case. The sequences are all easily
> categorised based on the first char in the sequence. It could be done
> based on the 2nd, or 3rd or whatever, but I donʼt think that reduces
> the number of entries. Plus thereʼs always one rule per character,
> since multiple patterns starting with the same character are combined
> using regexp-opt.

I wrote that to describe the general considerations, not necessarily
because I think they are applicable in this particular case.  I didn't
analyze the sequences to see whether any of what I wrote can or should
be used for them.

> One thing though: the code currently does set-char-table-range to a
> new value. Is there a chance that an entry already exists in
> composition-function-table for a particular character?

Only if the non-leading character is a combining character, which I
think is unlikely.  But in general, yes, this should be tested up
front to avoid losing composition rules.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:22:02 GMT) Full text and rfc822 format available.

Message #101 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:21:36 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 18:30:12 +0100
> 
> U+24C2 is an Emoji which has both a text and an emoji presentation.

I don't think I understand how this fact is relevant to which font
Emacs selects for a character.  Can you elaborate on the relation?

> If I test in gedit, U+24C2 on  its  own is displayed in black and white
> (happens to use "MS Gothic" font on my system).
> U+24C2 U+FE0E is displayed in black and white in gedit as well.
> U+24C2 U+FE0F is displayed in colour in gedit  using the "Noto Color
> Emoji" font.
> 
> These selectors don’t work in Emacs for me. U+24C2, U+24C2 U+FE0E, and
> U+24C2 U+FE0F *all* display in black and white for me in Emacs.
> 
> The selectors are displayed as a narrow box.

Emacs doesn't yet support display of sequences with variation
selectors, although at least some part of the infrastructure is
already there, see ftfont_variation_glyphs and similar functions in
other HarfBuzz-based font backends.

> The presence of such selectors in a currently visible buffer make my
> Emacs extremely slow and unresponsive, I can hardly finish typing this
> e-mail.

Does it help to set inhibit-compacting-font-caches non-nil?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:27:02 GMT) Full text and rfc822 format available.

Message #104 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mfabian <at> redhat.com, rpluim <at> gmail.com
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50;
 Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:25:44 +0200
> Date: Fri, 28 Feb 2020 22:21:36 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
> 
> Emacs doesn't yet support display of sequences with variation
> selectors, although at least some part of the infrastructure is
> already there, see ftfont_variation_glyphs and similar functions in
> other HarfBuzz-based font backends.

Hmm... it's possible I was confused, and the functions I mentioned are
unrelated to variation selectors.  To see if that's so, try to
configure composition-function-table to display such sequences as
composed characters, and see what happens when you use a proper font
(e.g., the one with which Gedit displays the variations).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:39:02 GMT) Full text and rfc822 format available.

Message #107 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 21:38:07 +0100
>>>>> On Fri, 28 Feb 2020 22:13:14 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> (set-fontset-font nil #x24c2
    >> '("Noto Color Emoji" . "iso10646-1") nil
    >> 'prepend)
    >> 
    >> in the frame displaying the character, then it does use Noto Color
    >> Emoji. What am I missing?

    Eli> Which part makes the difference: the "fontset-default" vs nil or
    Eli> symbol-subgroup vs an explicit codepoint?

symbol-subgroup in that context is (#x2460 . #x24FF)	;; Enclosed
Alphanumerics
so I suspect it's the nil rather than "fontset-default". <time passes>

     (set-fontset-font nil '(#x2460 . #x24FF)
     '("Noto Color Emoji" . "iso10646-1") nil
    'prepend)

Makes the character display using Noto Color Emoji.

     (set-fontset-font "fontset-default" '(#x2460 . #x24FF)
     '("Noto Color Emoji" . "iso10646-1") nil
    'prepend)

gives me:

Debugger entered--Lisp error: (error "Fontset ‘default-fontset’ does not exist")
  set-fontset-font("default-fontset" (9312 . 9471) ("Noto Color Emoji" . "iso10646-1") nil prepend)
  (progn (set-fontset-font "default-fontset" '(9312 . 9471) '("Noto Color Emoji" . "iso10646-1") nil 'prepend))
  eval((progn (set-fontset-font "default-fontset" '(9312 . 9471) '("Noto Color Emoji" . "iso10646-1") nil 'prepend)) t)
  elisp--eval-last-sexp(nil)
  eval-last-sexp(nil)
  funcall-interactively(eval-last-sexp nil)
  call-interactively(eval-last-sexp nil nil)
  command-execute(eval-last-sexp)

and similarly if I specify a single character rather than a
range. Using 't' instead of "default-fontset" doesnʼt error, but
doesnʼt cause any font changes either.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:56:02 GMT) Full text and rfc822 format available.

Message #110 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:55:10 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Fri, 28 Feb 2020 21:38:07 +0100
> 
>      (set-fontset-font "fontset-default" '(#x2460 . #x24FF)
>      '("Noto Color Emoji" . "iso10646-1") nil
>     'prepend)
> 
> gives me:
> 
> Debugger entered--Lisp error: (error "Fontset ‘default-fontset’ does not exist")
>   set-fontset-font("default-fontset" (9312 . 9471) ("Noto Color Emoji" . "iso10646-1") nil prepend)
>   (progn (set-fontset-font "default-fontset" '(9312 . 9471) '("Noto Color Emoji" . "iso10646-1") nil 'prepend))
>   eval((progn (set-fontset-font "default-fontset" '(9312 . 9471) '("Noto Color Emoji" . "iso10646-1") nil 'prepend)) t)

You used "default-fontset" instead of "fontset-default", thus the
error.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 20:57:02 GMT) Full text and rfc822 format available.

Message #113 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 21:56:04 +0100
>>>>> On Fri, 28 Feb 2020 22:16:21 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> One thing though: the code currently does set-char-table-range to a
    >> new value. Is there a chance that an entry already exists in
    >> composition-function-table for a particular character?

    Eli> Only if the non-leading character is a combining character, which I
    Eli> think is unlikely.  But in general, yes, this should be tested up
    Eli> front to avoid losing composition rules.

OK, Iʼll take that into account.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:03:02 GMT) Full text and rfc822 format available.

Message #116 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mfabian <at> redhat.com, rpluim <at> gmail.com
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50;
 Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 23:02:12 +0200
> Date: Fri, 28 Feb 2020 22:25:44 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 39799 <at> debbugs.gnu.org
> 
> Hmm... it's possible I was confused, and the functions I mentioned are
> unrelated to variation selectors.  To see if that's so, try to
> configure composition-function-table to display such sequences as
> composed characters, and see what happens when you use a proper font
> (e.g., the one with which Gedit displays the variations).

Looking into this some more reveals that we already have
composition-function-table set up for variation selectors, see the end
of lisp/language/japanese.el.  Not sure why it's in japanese.el, but
the code doesn't seem to be specific to Japanese characters, unless
I'm missing something.  So some debugging is required to understand
why we don't display sequences with variation selectors as intended.
Maybe DejaVu Sans doesn't support that?  What if you try

  emacs -Q -fn Noto Color Emoji"

does Emacs built with HarfBuzz then display the variation sequences as
expected?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:11:02 GMT) Full text and rfc822 format available.

Message #119 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:10:43 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,  39799 <at> debbugs.gnu.org
>> Date: Fri, 28 Feb 2020 18:30:12 +0100
>> 
>> U+24C2 is an Emoji which has both a text and an emoji presentation.
>
> I don't think I understand how this fact is relevant to which font
> Emacs selects for a character.  Can you elaborate on the relation?

That means that U+24C2 should display using a black and white “text”
font like Symbola and U+24C2 U+FE0E as well, if possible. But U+24C2
U+FE0F should be displayed with a color emoji font like “Noto Color
Emoji”, if possible.

That’s how it works in gedit for example.

By browsing the emoji data files with Emacs, I noticed that all emoji
which have text and emoji presentations *always* displayed in black and
white for me (Symbola font) even when I tried to set the fontset to use
“Noto Color Emoji” for these code points.

But for emoji which do not have these variants, setting the fontset to
use “Noto Color Emoji” worked.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:16:01 GMT) Full text and rfc822 format available.

Message #122 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:14:53 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

>>>>>> On Fri, 28 Feb 2020 22:13:14 +0200, Eli Zaretskii <eliz <at> gnu.org> said:
>
>     >> (set-fontset-font nil #x24c2
>     >> '("Noto Color Emoji" . "iso10646-1") nil
>     >> 'prepend)
>     >> 
>     >> in the frame displaying the character, then it does use Noto Color
>     >> Emoji. What am I missing?
>
>     Eli> Which part makes the difference: the "fontset-default" vs nil or
>     Eli> symbol-subgroup vs an explicit codepoint?
>
> symbol-subgroup in that context is (#x2460 . #x24FF)	;; Enclosed
> Alphanumerics
> so I suspect it's the nil rather than "fontset-default". <time passes>
>
>      (set-fontset-font nil '(#x2460 . #x24FF)
>      '("Noto Color Emoji" . "iso10646-1") nil
>     'prepend)
>
> Makes the character display using Noto Color Emoji.

That works for me as well!

But would it be possible to display it in Symbola if not followed by the
emoji representation selector and in Noto Color Emoji if followed by the
emoji representation selector??

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:23:01 GMT) Full text and rfc822 format available.

Message #125 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:22:15 +0100
>>>>> On Fri, 28 Feb 2020 22:55:10 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
    >> Date: Fri, 28 Feb 2020 21:38:07 +0100
    >> 
    >> (set-fontset-font "fontset-default" '(#x2460 . #x24FF)
    >> '("Noto Color Emoji" . "iso10646-1") nil
    >> 'prepend)
    >> 
    >> gives me:
    >> 
    >> Debugger entered--Lisp error: (error "Fontset ‘default-fontset’ does not exist")
    >> set-fontset-font("default-fontset" (9312 . 9471) ("Noto Color Emoji" . "iso10646-1") nil prepend)
    >> (progn (set-fontset-font "default-fontset" '(9312 . 9471) '("Noto Color Emoji" . "iso10646-1") nil 'prepend))
    >> eval((progn (set-fontset-font "default-fontset" '(9312 . 9471) '("Noto Color Emoji" . "iso10646-1") nil 'prepend)) t)

    Eli> You used "default-fontset" instead of "fontset-default", thus the
    Eli> error.

D'oh. Friday night strikes again :-)

With that fixed, neither the range or specific char variant makes any
difference, I need to use nil.

One other thing: the #x24C2 is not composed with the following #xFE0F
when itʼs displayed using Google Noto Sans. If I get it to display
with Noto Color Emoji it *is* composed, even though I haven't set up
any composition-function-table entries for it. Where is that
composition coming from?

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:28:01 GMT) Full text and rfc822 format available.

Message #128 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:27:11 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

> One other thing: the #x24C2 is not composed with the following #xFE0F
> when itʼs displayed using Google Noto Sans.

Yes, I also noticed the #xFE0F displayed as a box.

> If I get it to display
> with Noto Color Emoji it *is* composed, even though I haven't set up
> any composition-function-table entries for it. Where is that
> composition coming from?

No idea, I didn’t notice this until you mentioned it, but
I see this as well.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:33:02 GMT) Full text and rfc822 format available.

Message #131 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:32:10 +0100
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

>     Mike> The presence of such selectors in a currently visible buffer make my
>     Mike> Emacs extremely slow and unresponsive, I can hardly finish typing this
>     Mike> e-mail.

> Can you try my patch from
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39133#41> ? I probably
> should have pushed it already...

Great, that makes it fast again. With this patch, I can type normally
in a buffer containing variation selectors.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:39:01 GMT) Full text and rfc822 format available.

Message #134 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:38:41 +0100
>>>>> On Fri, 28 Feb 2020 22:32:10 +0100, Mike FABIAN <mfabian <at> redhat.com> said:

    Mike> Robert Pluim <rpluim <at> gmail.com> さんはかきました:
    Mike> The presence of such selectors in a currently visible buffer make my
    Mike> Emacs extremely slow and unresponsive, I can hardly finish typing this
    Mike> e-mail.

    >> Can you try my patch from
    >> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39133#41> ? I probably
    >> should have pushed it already...

    Mike> Great, that makes it fast again. With this patch, I can type normally
    Mike> in a buffer containing variation selectors.

Thanks for testing. Iʼll push it this weekend sometime.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:48:02 GMT) Full text and rfc822 format available.

Message #137 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 22:47:36 +0100
>>>>> On Fri, 28 Feb 2020 23:02:12 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Date: Fri, 28 Feb 2020 22:25:44 +0200
    >> From: Eli Zaretskii <eliz <at> gnu.org>
    >> Cc: 39799 <at> debbugs.gnu.org
    >> 
    >> Hmm... it's possible I was confused, and the functions I mentioned are
    >> unrelated to variation selectors.  To see if that's so, try to
    >> configure composition-function-table to display such sequences as
    >> composed characters, and see what happens when you use a proper font
    >> (e.g., the one with which Gedit displays the variations).

    Eli> Looking into this some more reveals that we already have
    Eli> composition-function-table set up for variation selectors, see the end
    Eli> of lisp/language/japanese.el.  Not sure why it's in japanese.el, but
    Eli> the code doesn't seem to be specific to Japanese characters, unless
    Eli> I'm missing something.  So some debugging is required to understand
    Eli> why we don't display sequences with variation selectors as intended.
    Eli> Maybe DejaVu Sans doesn't support that?  What if you try

    Eli>   emacs -Q -fn Noto Color Emoji"

    Eli> does Emacs built with HarfBuzz then display the variation sequences as
    Eli> expected?

-fn "Noto Color Emoji" doesnʼt change the default font for me for some
 reason, but if I change the font after startup then those sequences
 display correctly.

  (char-table-range composition-function-table #xFE0F)
=> ([".." 1 compose-gstring-for-variation-glyph])

and #xFE0F is always composable according to composite.c, so I donʼt
understand why composing only works with Noto Color Emoji. Or does the
font need specific support for it?

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:50:02 GMT) Full text and rfc822 format available.

Message #140 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 23:49:40 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 22:10:43 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> さんはかきました:
> 
> >> From: Mike FABIAN <mfabian <at> redhat.com>
> >> Cc: Eli Zaretskii <eliz <at> gnu.org>,  39799 <at> debbugs.gnu.org
> >> Date: Fri, 28 Feb 2020 18:30:12 +0100
> >> 
> >> U+24C2 is an Emoji which has both a text and an emoji presentation.
> >
> > I don't think I understand how this fact is relevant to which font
> > Emacs selects for a character.  Can you elaborate on the relation?
> 
> That means that U+24C2 should display using a black and white “text”
> font like Symbola and U+24C2 U+FE0E as well, if possible. But U+24C2
> U+FE0F should be displayed with a color emoji font like “Noto Color
> Emoji”, if possible.
> 
> That’s how it works in gedit for example.

If Gedit selects a font by looking at more than one codepoint (and I'm
not sure this is how it works in Gedit), then Emacs doesn't work that
way.

But regardless of how a font is selected, a single U+24C2 should be
displayed as its default glyph in a font, whereas U+24C2 followed by
U+FE0F should be displayed as a different glyph, the 16th variation of
U+24C2 provided by the font.  That variation could be a color one, or
it could be some other variation of the default glyph, it all depends
on what the font designer did.

> By browsing the emoji data files with Emacs, I noticed that all emoji
> which have text and emoji presentations *always* displayed in black and
> white for me (Symbola font) even when I tried to set the fontset to use
> “Noto Color Emoji” for these code points.

Emacs by default disregards the fontset for symbol and punctuation
characters.  Set use-default-font-for-symbols to nil to override that.

In any case, are these sequences displayed as composed characters?
Does "C-u C-x =" tell that the base character U+24C2 was composed with
the following variation selector?  According to the setup in
japanese.el, they should compose, if the font used for U+24C2 also
supports the variation selectors.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:52:02 GMT) Full text and rfc822 format available.

Message #143 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 23:50:47 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 22:14:53 +0100
> 
> But would it be possible to display it in Symbola if not followed by the
> emoji representation selector and in Noto Color Emoji if followed by the
> emoji representation selector??

No.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 21:53:02 GMT) Full text and rfc822 format available.

Message #146 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 23:52:09 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Fri, 28 Feb 2020 22:22:15 +0100
> 
> One other thing: the #x24C2 is not composed with the following #xFE0F
> when itʼs displayed using Google Noto Sans. If I get it to display
> with Noto Color Emoji it *is* composed, even though I haven't set up
> any composition-function-table entries for it. Where is that
> composition coming from?

Emacs can only compose characters if the font supports all of the
codepoints that are being composed.  So you need to choose a font that
supports these compositions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 28 Feb 2020 22:08:02 GMT) Full text and rfc822 format available.

Message #149 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 00:07:05 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
> Date: Fri, 28 Feb 2020 22:47:36 +0100
> 
> -fn "Noto Color Emoji" doesnʼt change the default font for me for some
>  reason, but if I change the font after startup then those sequences
>  display correctly.
> 
>   (char-table-range composition-function-table #xFE0F)
> => ([".." 1 compose-gstring-for-variation-glyph])

OK, so this feature already works for suitable fonts.

> and #xFE0F is always composable according to composite.c, so I donʼt
> understand why composing only works with Noto Color Emoji. Or does the
> font need specific support for it?

Yes, the font needs to have glyph variations, see
font-variation-glyphs and its underlying font-backend method
get_variation_glyphs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 07:51:01 GMT) Full text and rfc822 format available.

Message #152 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Robert Pluim <rpluim <at> gmail.com>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 08:50:40 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: mfabian <at> redhat.com,  39799 <at> debbugs.gnu.org
>> Date: Fri, 28 Feb 2020 22:47:36 +0100
>> 
>> -fn "Noto Color Emoji" doesnʼt change the default font for me for some
>>  reason, but if I change the font after startup then those sequences
>>  display correctly.
>> 
>>   (char-table-range composition-function-table #xFE0F)
>> => ([".." 1 compose-gstring-for-variation-glyph])
>
> OK, so this feature already works for suitable fonts.
>
>> and #xFE0F is always composable according to composite.c, so I donʼt
>> understand why composing only works with Noto Color Emoji. Or does the
>> font need specific support for it?
>
> Yes, the font needs to have glyph variations, see
> font-variation-glyphs and its underlying font-backend method
> get_variation_glyphs.

http://unicode.org/reports/tr51/#Presentation_Style

doesn’t seem to say that the fonts should have the variations.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 08:01:01 GMT) Full text and rfc822 format available.

Message #155 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 08:59:49 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

> If Gedit selects a font by looking at more than one codepoint (and I'm
> not sure this is how it works in Gedit), then Emacs doesn't work that
> way.

Yes, Gedit does this somehow with pango. It tries to avoid switching
fonts in places where it would look bad. For example, if you have a
default font supporting only ASCII and then there is a word containing
some non-ASCII character like “grün” it chooses a font containing the
“ü” for the whole word to avoid the “ü” looking out of place.

> In any case, are these sequences displayed as composed characters?
> Does "C-u C-x =" tell that the base character U+24C2 was composed with
> the following variation selector?  According to the setup in
> japanese.el, they should compose, if the font used for U+24C2 also
> supports the variation selectors.

Yes, it does tell that it was composed with the following character:

             position: 255 of 257 (99%), column: 0
            character: Ⓜ (displayed as Ⓜ) (codepoint 9410, #o22302, #x24c2)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x24C2
               script: symbol
               syntax: w 	which means: word
             category: .:Base, L:Left-to-right (strong), l:Latin
             to input: type "C-x 8 RET 24c2" or "C-x 8 RET CIRCLED LATIN CAPITAL LETTER M"
          buffer code: #xE2 #x93 #x82
            file code: #xE2 #x93 #x82 (encoded by coding system utf-8-unix)
              display: composed to form "Ⓜ️" (see below)

Composed with the following character(s) "️" using this font:
  ftcrhb:-GOOG-Noto Color Emoji-normal-normal-normal-*-16-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 9410 50 20 0 20 15 4 nil]

Character code properties: customize what to show
  name: CIRCLED LATIN CAPITAL LETTER M
  general-category: So (Symbol, Other)
  decomposition: (circle 77) (circle 'M')

There are text properties here:
  fontified            nil

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 08:02:02 GMT) Full text and rfc822 format available.

Message #158 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Robert Pluim <rpluim <at> gmail.com>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 09:01:37 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: 39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
>> Date: Fri, 28 Feb 2020 22:22:15 +0100
>> 
>> One other thing: the #x24C2 is not composed with the following #xFE0F
>> when itʼs displayed using Google Noto Sans. If I get it to display
>> with Noto Color Emoji it *is* composed, even though I haven't set up
>> any composition-function-table entries for it. Where is that
>> composition coming from?
>
> Emacs can only compose characters if the font supports all of the
> codepoints that are being composed.  So you need to choose a font that
> supports these compositions.

I think there are no fonts supporting both the emoji representations and
text representations of emoji which have both.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 09:41:02 GMT) Full text and rfc822 format available.

Message #161 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 11:40:17 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Robert Pluim <rpluim <at> gmail.com>,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 08:50:40 +0100
> 
> >> and #xFE0F is always composable according to composite.c, so I donʼt
> >> understand why composing only works with Noto Color Emoji. Or does the
> >> font need specific support for it?
> >
> > Yes, the font needs to have glyph variations, see
> > font-variation-glyphs and its underlying font-backend method
> > get_variation_glyphs.
> 
> http://unicode.org/reports/tr51/#Presentation_Style
> 
> doesn’t seem to say that the fonts should have the variations.

Please elaborate: which part thereof says that, and what are the
implications regarding the fonts?

The rendering of Emoji sequences is handled in Emacs via the font
backend: Emacs submits the sequence to the backend, and the backend
returns one or more glyphs that should be used to display the
sequence.  Emacs only submits a sequence of characters to the backend
if the sequence matches one of the composition rules in
composition-function-table.  And the possible match for such
composition rules is limited to character sequences that have the same
'face' text property, which in particular means the same font.  In the
case of variation selectors as part of the characters to be composed,
Emacs additionally tests that the face's font has a glyph for the
specified variation selector.

If you are saying some of the above contradicts Unicode, please point
out which part(s) and why.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 09:50:02 GMT) Full text and rfc822 format available.

Message #164 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 11:49:25 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Robert Pluim <rpluim <at> gmail.com>,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 09:01:37 +0100
> 
> > Emacs can only compose characters if the font supports all of the
> > codepoints that are being composed.  So you need to choose a font that
> > supports these compositions.
> 
> I think there are no fonts supporting both the emoji representations and
> text representations of emoji which have both.

Sorry, I don't understand: what does "which have both" refer to?

Emacs doesn't create the text and emoji presentations, it just hands
the sequences to the font backend and asks the backend to provide the
font glyphs to display that sequence.  The rest is between the font
backend and the font.  And of course all this depends on
composition-function-table being set up to support these sequences.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 10:06:02 GMT) Full text and rfc822 format available.

Message #167 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 12:04:54 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 08:59:49 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> さんはかきました:
> 
> > If Gedit selects a font by looking at more than one codepoint (and I'm
> > not sure this is how it works in Gedit), then Emacs doesn't work that
> > way.
> 
> Yes, Gedit does this somehow with pango. It tries to avoid switching
> fonts in places where it would look bad. For example, if you have a
> default font supporting only ASCII and then there is a word containing
> some non-ASCII character like “grün” it chooses a font containing the
> “ü” for the whole word to avoid the “ü” looking out of place.

Well, "somehow" is not enough to see whether we have any additional
work to do in Emacs, because Emacs also tries to achieve that same
goal.  There are many different ways to achieve it, though; for
example, Emacs will AFAIK by default not even use a font that could
support ASCII, but not Latin-1 blocks as the default face's font.

What you say about Gedit makes sense in general, but questions
immediately pop up: how does Gedit define a "word" (Emacs, as you
know, has very a flexible definition that can be controlled from
Lisp), how does it "know" that a word like "grün" belongs to the same
script (otherwise displaying a character from another script using a
different font, as in, say, "grאn" might make sense), etc.

IOW, what we need is a detailed description of what Pango does here,
and how does Gedit affect that by configuring its default fonts.  Only
then we can reason about the differences between that and what Emacs
does.

> > In any case, are these sequences displayed as composed characters?
> > Does "C-u C-x =" tell that the base character U+24C2 was composed with
> > the following variation selector?  According to the setup in
> > japanese.el, they should compose, if the font used for U+24C2 also
> > supports the variation selectors.
> 
> Yes, it does tell that it was composed with the following character:

And the resulting display is what you expect?  If not, then I think
you need to find a font which supports Emoji presentation of
characters such as Ⓜ, and make Emacs use it for those sequences.

If you think this Emacs requirement for a capable font is incorrect, I
suggest to post a question about this to the HarfBuzz mailing list,
harfbuzz <at> lists.freedesktop.org, maybe HarfBuzz has capabilities in
this regard that we somehow don't yet utilize.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 10:27:02 GMT) Full text and rfc822 format available.

Message #170 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 11:26:11 +0100
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: Robert Pluim <rpluim <at> gmail.com>,  39799 <at> debbugs.gnu.org
>> Date: Sat, 29 Feb 2020 09:01:37 +0100
>> 
>> > Emacs can only compose characters if the font supports all of the
>> > codepoints that are being composed.  So you need to choose a font that
>> > supports these compositions.
>> 
>> I think there are no fonts supporting both the emoji representations and
>> text representations of emoji which have both.
>
> Sorry, I don't understand: what does "which have both" refer to?

There are some emoji which have both emoji and text representations,
for example:

$ grep 24C2 emoji-variation-sequences.txt
24C2 FE0E  ; text style;  # (1.1) CIRCLED LATIN CAPITAL LETTER M
24C2 FE0F  ; emoji style; # (1.1) CIRCLED LATIN CAPITAL LETTER M
$ grep 24C2 emoji-data.txt
24C2          ; Emoji                # E0.6   [1] (Ⓜ️)       circled M
24C2          ; Extended_Pictographic# E0.6   [1] (Ⓜ️)       circled M
$

Other (most) emoji have only emoji representation, for example:

$ grep 1F600 emoji-data.txt
1F600         ; Emoji                # E1.0   [1] (😀)       grinning face
1F600         ; Emoji_Presentation   # E1.0   [1] (😀)       grinning face
1F600         ; Extended_Pictographic# E1.0   [1] (😀)       grinning face
$

> Emacs doesn't create the text and emoji presentations, it just hands
> the sequences to the font backend and asks the backend to provide the
> font glyphs to display that sequence.  The rest is between the font
> backend and the font.  And of course all this depends on
> composition-function-table being set up to support these sequences.

I think the specification in Unicode is quite vague on how this should
be implemented. It doesn’t say anything about whether the fonts should
do it or the application.

If I understand it correctly, with the current implementation in Emacs,
it would work if a font had glyphs for both variation selectors.

I.e. if there was a font which had a colour glyph for

24C2 FE0F  ; emoji style; # (1.1) CIRCLED LATIN CAPITAL LETTER M

*and* a black and white glyph for

24C2 FE0E  ; text style;  # (1.1) CIRCLED LATIN CAPITAL LETTER M

Then it would work in Emacs and 24C2 FE0F would display in colour
and 24C2 FE0E in black and white.

But there are no such fonts (yet??). Symbola doesn’t support the
variation selectors at all, i.e. when using Symbola

(set-fontset-font nil '(#x2460 . #x24FF) '("Symbola" . "iso10646-1") nil 'prepend)

one gets the three variants

Ⓜ U+24C2
Ⓜ︎ U+24C2 U+FE0E
Ⓜ️ U+24C2 U+FE0F

all in black and white and for the two variants which have the variation
selectors one sees a narrow box in Emacs.

When using “Noto Color Emoji” or “Joypixels”, one gets all three
variants in colour, and a box is only shown for the line in the middle
with the U+FE0E text style selector because neither “Noto Color Emoji”
nor “Joypixels” seem to implement that one. The emoji style selector
U+FE0F does not show a box though, if I understand you correctly that
means that apparently both “Noto Color Emoji” and “Joypixels” implement
the U+FE0F variation selector.

If I paste these 3 lines into gedit (or anything else which uses pango
for this) I see that different fonts  are used. Can also be seen with

pango-view --font="DejaVu Sans" ~/emoji-representation-test.txt

(I attached the emoji-representation-test.txt file and how it is
displayed by pango-view).

I specified the DejaVu Sans font on the command line (which is used for
the ASCII text in that screenshot. For the emoji, different fonts are
used, on my system where I made that screenshot it happens to be the
font “MS Gothic” for the emoji in the first two lines and “Noto Color
Emoji” for the last line. So pango uses different fonts for a text
representation emoji sequence than for emoji representation.

The specification in Unicode does not seem to say that this is not
allowed and that the font has to do it, it seems a bit vague there.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。

[emoji-representation-test.txt (text/plain, inline)]
Ⓜ U+24C2
Ⓜ︎ U+24C2 U+FE0E
Ⓜ️ U+24C2 U+FE0F
[pango-view-emoji-representation-test.txt.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 10:46:02 GMT) Full text and rfc822 format available.

Message #173 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 11:45:10 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: Robert Pluim <rpluim <at> gmail.com>,  39799 <at> debbugs.gnu.org
>> Date: Sat, 29 Feb 2020 08:50:40 +0100
>> 
>> >> and #xFE0F is always composable according to composite.c, so I donʼt
>> >> understand why composing only works with Noto Color Emoji. Or does the
>> >> font need specific support for it?
>> >
>> > Yes, the font needs to have glyph variations, see
>> > font-variation-glyphs and its underlying font-backend method
>> > get_variation_glyphs.
>> 
>> http://unicode.org/reports/tr51/#Presentation_Style
>> 
>> doesn’t seem to say that the fonts should have the variations.
>
> Please elaborate: which part thereof says that, and what are the
> implications regarding the fonts?

I think it is a bit vague. It does not say that this should be handled
by the fonts having glyphs for both styles, it does not seems to say
that it should be handled by switching fonts according to the variation
either. Currently no font which implements both the text and the emoji
representation seems to exist. So currently one can only make it work by
switching fonts depending on whether one wants to show text or emoji
representation. Pango does it that way.

I don’t know whether this is supposed to be the “right” way to it.

> The rendering of Emoji sequences is handled in Emacs via the font
> backend: Emacs submits the sequence to the backend, and the backend
> returns one or more glyphs that should be used to display the
> sequence.  Emacs only submits a sequence of characters to the backend
> if the sequence matches one of the composition rules in
> composition-function-table.  And the possible match for such
> composition rules is limited to character sequences that have the same
> 'face' text property, which in particular means the same font.  In the
> case of variation selectors as part of the characters to be composed,
> Emacs additionally tests that the face's font has a glyph for the
> specified variation selector.

So this would work if a font had both black and white glyphs and colored
glyphs and used the variation selectors to select the desired glyph.

Maybe there should be a font like this, but currently no such font seems
to exist. 

> If you are saying some of the above contradicts Unicode, please point
> out which part(s) and why.

I don’t think it contradicts Unicode. At least I am not sure, I think
the specification is not clear how this should be done.

But even if it doesn’t contradict Unicode, it means that it won't work
well with currently available fonts.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 11:15:01 GMT) Full text and rfc822 format available.

Message #176 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 12:14:28 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
>> Date: Sat, 29 Feb 2020 08:59:49 +0100
>> 
>> Eli Zaretskii <eliz <at> gnu.org> さんはかきました:
>> 
>> > If Gedit selects a font by looking at more than one codepoint (and I'm
>> > not sure this is how it works in Gedit), then Emacs doesn't work that
>> > way.
>> 
>> Yes, Gedit does this somehow with pango. It tries to avoid switching
>> fonts in places where it would look bad. For example, if you have a
>> default font supporting only ASCII and then there is a word containing
>> some non-ASCII character like “grün” it chooses a font containing the
>> “ü” for the whole word to avoid the “ü” looking out of place.
>
> Well, "somehow" is not enough to see whether we have any additional
> work to do in Emacs, because Emacs also tries to achieve that same
> goal.  There are many different ways to achieve it, though; for
> example, Emacs will AFAIK by default not even use a font that could
> support ASCII, but not Latin-1 blocks as the default face's font.
>
> What you say about Gedit makes sense in general, but questions
> immediately pop up: how does Gedit define a "word" (Emacs, as you
> know, has very a flexible definition that can be controlled from
> Lisp), how does it "know" that a word like "grün" belongs to the same
> script (otherwise displaying a character from another script using a
> different font, as in, say, "grאn" might make sense), etc.

Yes, “word” is already too simplified.


> IOW, what we need is a detailed description of what Pango does here,
> and how does Gedit affect that by configuring its default fonts.  Only
> then we can reason about the differences between that and what Emacs
> does.

Yes, you are right, and I think this is very difficult.

I don’t know the details, but Pango seems to “cut” text into “runs”
where each “run” is rendered with a single font. And it tries to
cut the text into “runs” in a way that the overall result looks
as nice as possible. This is really difficult and doesn’t always
work well, sometimes the results are ugly although overall it seems to
do a good job.

>> > In any case, are these sequences displayed as composed characters?
>> > Does "C-u C-x =" tell that the base character U+24C2 was composed with
>> > the following variation selector?  According to the setup in
>> > japanese.el, they should compose, if the font used for U+24C2 also
>> > supports the variation selectors.
>> 
>> Yes, it does tell that it was composed with the following character:
>
> And the resulting display is what you expect?  If not, then I think
> you need to find a font which supports Emoji presentation of
> characters such as Ⓜ, and make Emacs use it for those sequences.

Yes, in the case of Ⓜ️ U+24C2 U+FE0F the result in Emacs is perfect
when using “Noto Color Emoji” or “Joypixels”. It is displayed in colour
and behaves as a single character in the buffer, the variation selector
is not displayed as a box. This is perfect.

But when using Symbola for the same sequence one sees U+FE0F as an ugly
box.

And when displaying the text representation sequence Ⓜ︎ U+24C2 U+FE0E
one always sees U+FE0E as a box no matter whether using “Symbola”,
“Noto Color Emoji” or “Joypixels”.

I am not sure whether this is wrong. Maybe it is OK to require a font
which can handle this? I am really not sure...

But what about # U+0023 NUMBER SIGN ?

This does have an emoji representation.

I.e. U+0023 U+FE0F displays in color as an emoji in pango-view and
gedit.

How could this ever work in Emacs? If you have to decide for a single
font to render U+0023 in Emacs, you would need to set a “capable” emoji
font for an ASCII character like #. One probably does not want to do
that. Then # in text representation would look different in style than
the other ASCII characters because it would come as the text
representation glyph from some emoji font which would probably not go
well together with other ASCII characters coming from some font like
for example “DejaVu Sans Mono”. So one probably wants to set
something like “DejaVu Sans Mono” for # as well, otherwise normal text
won’t look nice. But how can one display U+0023 U+FE0F as am emoji then?

This seems very messy, I don’t know how this can be solved.

> If you think this Emacs requirement for a capable font is incorrect, I
> suggest to post a question about this to the HarfBuzz mailing list,
> harfbuzz <at> lists.freedesktop.org, maybe HarfBuzz has capabilities in
> this regard that we somehow don't yet utilize.

Yes, I’ll try that, maybe that helps to understand it better.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 11:20:02 GMT) Full text and rfc822 format available.

Message #179 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 13:19:39 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 11:26:11 +0100
> 
> If I understand it correctly, with the current implementation in Emacs,
> it would work if a font had glyphs for both variation selectors.
> 
> I.e. if there was a font which had a colour glyph for
> 
> 24C2 FE0F  ; emoji style; # (1.1) CIRCLED LATIN CAPITAL LETTER M
> 
> *and* a black and white glyph for
> 
> 24C2 FE0E  ; text style;  # (1.1) CIRCLED LATIN CAPITAL LETTER M
> 
> Then it would work in Emacs and 24C2 FE0F would display in colour
> and 24C2 FE0E in black and white.

Yes, in that case Emacs will display both sequences using the same
font.

> But there are no such fonts (yet??). Symbola doesn’t support the
> variation selectors at all, i.e. when using Symbola
> 
> (set-fontset-font nil '(#x2460 . #x24FF) '("Symbola" . "iso10646-1") nil 'prepend)
> 
> one gets the three variants
> 
> Ⓜ U+24C2
> Ⓜ︎ U+24C2 U+FE0E
> Ⓜ️ U+24C2 U+FE0F
> 
> all in black and white and for the two variants which have the variation
> selectors one sees a narrow box in Emacs.
> 
> When using “Noto Color Emoji” or “Joypixels”, one gets all three
> variants in colour, and a box is only shown for the line in the middle
> with the U+FE0E text style selector because neither “Noto Color Emoji”
> nor “Joypixels” seem to implement that one. The emoji style selector
> U+FE0F does not show a box though, if I understand you correctly that
> means that apparently both “Noto Color Emoji” and “Joypixels” implement
> the U+FE0F variation selector.

OK, but then characters such as Ⓜ U+24C2 are supposed to be displayed
in their text presentation by default, so the sequence Ⓜ︎ U+24C2 U+FE0E
seems redundant, as it should display the same as just Ⓜ U+24C2.  So
this is not such a big loss for Emacs: you could use a font which
supports only the Ⓜ️ U+24C2 U+FE0F sequence, and use just Ⓜ U+24C2 for
the text presentation.

> If I paste these 3 lines into gedit (or anything else which uses pango
> for this) I see that different fonts  are used. Can also be seen with
> 
> pango-view --font="DejaVu Sans" ~/emoji-representation-test.txt

You could have the same in Emacs if you define a special face that
uses the other font, and then put that face on the sequence which
isn't composed using the font selected by Emacs.

> (I attached the emoji-representation-test.txt file and how it is
> displayed by pango-view).

I see only a small image showing the font name, and nothing else.
Some problem with sending the attachment?

> I specified the DejaVu Sans font on the command line (which is used for
> the ASCII text in that screenshot. For the emoji, different fonts are
> used, on my system where I made that screenshot it happens to be the
> font “MS Gothic” for the emoji in the first two lines and “Noto Color
> Emoji” for the last line. So pango uses different fonts for a text
> representation emoji sequence than for emoji representation.

Like I said, we need a more detailed understanding of how the font is
selected by Pango in these cases.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 11:38:02 GMT) Full text and rfc822 format available.

Message #182 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 12:36:51 +0100
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> Ⓜ U+24C2
>> Ⓜ︎ U+24C2 U+FE0E
>> Ⓜ️ U+24C2 U+FE0F
>> 
>> all in black and white and for the two variants which have the variation
>> selectors one sees a narrow box in Emacs.
>> 
>> When using “Noto Color Emoji” or “Joypixels”, one gets all three
>> variants in colour, and a box is only shown for the line in the middle
>> with the U+FE0E text style selector because neither “Noto Color Emoji”
>> nor “Joypixels” seem to implement that one. The emoji style selector
>> U+FE0F does not show a box though, if I understand you correctly that
>> means that apparently both “Noto Color Emoji” and “Joypixels” implement
>> the U+FE0F variation selector.
>
> OK, but then characters such as Ⓜ U+24C2 are supposed to be displayed
> in their text presentation by default,

Yes.

> so the sequence Ⓜ︎ U+24C2 U+FE0E
> seems redundant, as it should display the same as just Ⓜ U+24C2.

Yes, it seems a bit redundant. I was also surprised when I discovered
U+FE0E. I think *all* the emoji which can be followed by U+FE0E (all
those in
http://www.unicode.org/Public/emoji/12.0/emoji-variation-sequences.txt)
have text representation by default anyway, so why is U+FE0E needed at
all?

> So this is not such a big loss for Emacs: you could use a font which
> supports only the Ⓜ️ U+24C2 U+FE0F sequence, and use just Ⓜ U+24C2 for
> the text presentation.

Yes.

>> If I paste these 3 lines into gedit (or anything else which uses pango
>> for this) I see that different fonts  are used. Can also be seen with
>> 
>> pango-view --font="DejaVu Sans" ~/emoji-representation-test.txt
>
> You could have the same in Emacs if you define a special face that
> uses the other font, and then put that face on the sequence which
> isn't composed using the font selected by Emacs.

I think I don’t understand that completely. But you seem to say that it
is possible to make Emacs use different fonts for U+24C2 and the
sequence U+24C2 U+FE0F ? That sounds nice and would probably make it
work better.

>> (I attached the emoji-representation-test.txt file and how it is
>> displayed by pango-view).
>
> I see only a small image showing the font name, and nothing else.
> Some problem with sending the attachment?

Oh, sorry, I apparently clicked on the title bar of the window
when making the screenshot with the “import” tool. New screenshot
attached.

>> I specified the DejaVu Sans font on the command line (which is used for
>> the ASCII text in that screenshot. For the emoji, different fonts are
>> used, on my system where I made that screenshot it happens to be the
>> font “MS Gothic” for the emoji in the first two lines and “Noto Color
>> Emoji” for the last line. So pango uses different fonts for a text
>> representation emoji sequence than for emoji representation.
>
> Like I said, we need a more detailed understanding of how the font is
> selected by Pango in these cases.

-- 
Mike FABIAN <mfabian <at> redhat.com>
[pango-view-emoji-representation-test.txt.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 11:42:02 GMT) Full text and rfc822 format available.

Message #185 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 12:41:02 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

> OK, but then characters such as Ⓜ U+24C2 are supposed to be displayed
> in their text presentation by default, so the sequence Ⓜ︎ U+24C2 U+FE0E
> seems redundant, as it should display the same as just Ⓜ U+24C2.

http://unicode.org/faq/vs.html#5 says:

> Q: How should variation sequences be displayed?
> 
> A: When they are valid variation sequences, they should be displayed as
> illustrated in the Unicode code charts, the emoji charts, or in the
> Ideographic Variation Database. When a variation sequence is not valid
> or its display is not supported, the base character is displayed as
> usual, and the variation selector is invisible. See Display of
> Unsupported Characters.
> 
> Q:What about applications that don't support variation sequences?
> 
> A: Applications not supporting variation sequences should act as if the
> variation selector is not present. That normally applies to all text
> processes such as searching, sorting, parsing, and so forth.

So probably  Ⓜ︎ U+24C2 U+FE0E should not display U+FE0E as a box, as you
say it should exactly display as just Ⓜ U+24C2.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 11:53:01 GMT) Full text and rfc822 format available.

Message #188 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 13:52:17 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 12:14:28 +0100
> 
> > IOW, what we need is a detailed description of what Pango does here,
> > and how does Gedit affect that by configuring its default fonts.  Only
> > then we can reason about the differences between that and what Emacs
> > does.
> 
> Yes, you are right, and I think this is very difficult.
> 
> I don’t know the details, but Pango seems to “cut” text into “runs”
> where each “run” is rendered with a single font. And it tries to
> cut the text into “runs” in a way that the overall result looks
> as nice as possible.

Every text-shaping engine, including those used by Emacs, does that.
The devil is in the details: how exactly are the "runs" decided.

> >> Yes, it does tell that it was composed with the following character:
> >
> > And the resulting display is what you expect?  If not, then I think
> > you need to find a font which supports Emoji presentation of
> > characters such as Ⓜ, and make Emacs use it for those sequences.
> 
> Yes, in the case of Ⓜ️ U+24C2 U+FE0F the result in Emacs is perfect
> when using “Noto Color Emoji” or “Joypixels”. It is displayed in colour
> and behaves as a single character in the buffer, the variation selector
> is not displayed as a box. This is perfect.
> 
> But when using Symbola for the same sequence one sees U+FE0F as an ugly
> box.

So we should augment our default fontsets to use Emoji-capable fonts
in preference to those, like Symbola, which aren't.  And perhaps for
Emoji we should make the exception in the rule that we prefer the
default face's font, so that users will not need to tweak
use-default-font-for-symbols to have Emoji display with those capable
fonts.  Patches to these effects are welcome.

> But what about # U+0023 NUMBER SIGN ?
> 
> This does have an emoji representation.

The question is how important is to be able to display that character
as an Emoji, in the context of the jobs that Emacs is mainly used
for.  Maybe not too much.

> How could this ever work in Emacs? If you have to decide for a single
> font to render U+0023 in Emacs, you would need to set a “capable” emoji
> font for an ASCII character like #. One probably does not want to do
> that.

If fonts like DejaVu Sans Mono and others, routinely used for
displaying fixed-pitch text (such as program source code) acquire the
capabilities of displaying Emoji, that is exactly what should be done.
As long as the current tendency of using Emoji everywhere continues, I
see no reason not to expect those fonts to be enhanced to support
Emoji.

> Then # in text representation would look different in style than
> the other ASCII characters because it would come as the text
> representation glyph from some emoji font which would probably not go
> well together with other ASCII characters coming from some font like
> for example “DejaVu Sans Mono”. So one probably wants to set
> something like “DejaVu Sans Mono” for # as well, otherwise normal text
> won’t look nice. But how can one display U+0023 U+FE0F as am emoji then?
> 
> This seems very messy, I don’t know how this can be solved.

The 'face' text property method I mentioned elsewhere should still
work, even if a single font cannot display both text and Emoji
presentation of the sequences.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 11:59:02 GMT) Full text and rfc822 format available.

Message #191 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 13:58:11 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 12:36:51 +0100
> 
> > You could have the same in Emacs if you define a special face that
> > uses the other font, and then put that face on the sequence which
> > isn't composed using the font selected by Emacs.
> 
> I think I don’t understand that completely. But you seem to say that it
> is possible to make Emacs use different fonts for U+24C2 and the
> sequence U+24C2 U+FE0F ?

It should be possible, yes.  Define a new face, and make that face use
a font that can display sequences like U+24C2 U+FE0E.  Then put a
'face' text property whose value is that face you defined, on the text
containing such a sequence.  This should force Emacs to use the font
you specified for that stretch of text, regardless of the fontset and
the default font.

Given that we don't really see why sequences with U+FE0E are needed,
perhaps requiring Lisp programs which do want to display such
sequences to use a special face is not such a big deal?

> >> (I attached the emoji-representation-test.txt file and how it is
> >> displayed by pango-view).
> >
> > I see only a small image showing the font name, and nothing else.
> > Some problem with sending the attachment?
> 
> Oh, sorry, I apparently clicked on the title bar of the window
> when making the screenshot with the “import” tool. New screenshot
> attached.

OK, I see it now, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 12:04:01 GMT) Full text and rfc822 format available.

Message #194 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 14:02:56 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 12:41:02 +0100
> 
> > Q:What about applications that don't support variation sequences?
> > 
> > A: Applications not supporting variation sequences should act as if the
> > variation selector is not present. That normally applies to all text
> > processes such as searching, sorting, parsing, and so forth.
> 
> So probably  Ⓜ︎ U+24C2 U+FE0E should not display U+FE0E as a box, as you
> say it should exactly display as just Ⓜ U+24C2.

You can control this via glyphless-char-display.  The thin 1-pixel
space is just the current default, and we could change that default if
we think it's better not to display the variation selectors at all.
It's just that the "Emacsy" way is not to conceal any characters from
the user, so the current default was chosen to follow that.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 17:00:02 GMT) Full text and rfc822 format available.

Message #197 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 17:59:25 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> > And the resulting display is what you expect?  If not, then I think
>> > you need to find a font which supports Emoji presentation of
>> > characters such as Ⓜ, and make Emacs use it for those sequences.
>> 
>> Yes, in the case of Ⓜ️ U+24C2 U+FE0F the result in Emacs is perfect
>> when using “Noto Color Emoji” or “Joypixels”. It is displayed in colour
>> and behaves as a single character in the buffer, the variation selector
>> is not displayed as a box. This is perfect.
>> 
>> But when using Symbola for the same sequence one sees U+FE0F as an ugly
>> box.
>
> So we should augment our default fontsets to use Emoji-capable fonts
> in preference to those, like Symbola, which aren't.  And perhaps for
> Emoji we should make the exception in the rule that we prefer the
> default face's font, so that users will not need to tweak
> use-default-font-for-symbols to have Emoji display with those capable
> fonts.  Patches to these effects are welcome.
>
>> But what about # U+0023 NUMBER SIGN ?
>> 
>> This does have an emoji representation.
>
> The question is how important is to be able to display that character
> as an Emoji, in the context of the jobs that Emacs is mainly used
> for.  Maybe not too much.

I agree, not very important at all. I am surprised why this exists at
all as an emoji.

>> How could this ever work in Emacs? If you have to decide for a single
>> font to render U+0023 in Emacs, you would need to set a “capable” emoji
>> font for an ASCII character like #. One probably does not want to do
>> that.
>
> If fonts like DejaVu Sans Mono and others, routinely used for
> displaying fixed-pitch text (such as program source code) acquire the
> capabilities of displaying Emoji, that is exactly what should be done.
> As long as the current tendency of using Emoji everywhere continues, I
> see no reason not to expect those fonts to be enhanced to support
> Emoji.

Yes, maybe that could be a long term solution.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 17:04:02 GMT) Full text and rfc822 format available.

Message #200 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 18:03:01 +0100
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
>> Date: Sat, 29 Feb 2020 12:36:51 +0100
>> 
>> > You could have the same in Emacs if you define a special face that
>> > uses the other font, and then put that face on the sequence which
>> > isn't composed using the font selected by Emacs.
>> 
>> I think I don’t understand that completely. But you seem to say that it
>> is possible to make Emacs use different fonts for U+24C2 and the
>> sequence U+24C2 U+FE0F ?
>
> It should be possible, yes.  Define a new face, and make that face use
> a font that can display sequences like U+24C2 U+FE0E.  Then put a
> 'face' text property whose value is that face you defined, on the text
> containing such a sequence.  This should force Emacs to use the font
> you specified for that stretch of text, regardless of the fontset and
> the default font.

But that would be a manual process? Or would it be displayed like that
by default?

> Given that we don't really see why sequences with U+FE0E are needed,
> perhaps requiring Lisp programs which do want to display such
> sequences to use a special face is not such a big deal?

Ah, I see, something like

(require 'improve-emoji-display)

which does this magic defining such a face.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 17:15:01 GMT) Full text and rfc822 format available.

Message #203 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 18:14:24 +0100
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
>> Date: Sat, 29 Feb 2020 12:41:02 +0100
>> 
>> > Q:What about applications that don't support variation sequences?
>> > 
>> > A: Applications not supporting variation sequences should act as if the
>> > variation selector is not present. That normally applies to all text
>> > processes such as searching, sorting, parsing, and so forth.
>> 
>> So probably  Ⓜ︎ U+24C2 U+FE0E should not display U+FE0E as a box, as you
>> say it should exactly display as just Ⓜ U+24C2.
>
> You can control this via glyphless-char-display.  The thin 1-pixel
> space is just the current default, and we could change that default if
> we think it's better not to display the variation selectors at all.
> It's just that the "Emacsy" way is not to conceal any characters from
> the user, so the current default was chosen to follow that.

I agree, not displaying anything at all is not so nice either.
It is nicer if one can find something there when stepping over it and
delete the U+FE0E, that is really more "Emacsy", I think.

Like when I have “AA” (U+0041 U+FEFF U+0041) I can still move the cursor
over that string and find that there is something between the two As,
check what that is, delete it if I want ...  

So a thin 1-pixel space sounds good to me, it does not look ugly but one
can still edit it.

But currently I don’t seem to get a thin 1-pixel space for

Ⓜ︎ U+24C2 U+FE0E

I get a box with a 1 pixel border in the foreground colour (black) and
this box has a total width of 6 pixels. See attached screenshot.

This looks fairly ugly is a bit too much maybe.

You get a 1-pixel space? Is special setup needed to get that or should
I get that by default?

-- 
Mike FABIAN <mfabian <at> redhat.com>

[display-of-fe0e.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 17:20:01 GMT) Full text and rfc822 format available.

Message #206 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 19:19:38 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 18:03:01 +0100
> 
> >> I think I don’t understand that completely. But you seem to say that it
> >> is possible to make Emacs use different fonts for U+24C2 and the
> >> sequence U+24C2 U+FE0F ?
> >
> > It should be possible, yes.  Define a new face, and make that face use
> > a font that can display sequences like U+24C2 U+FE0E.  Then put a
> > 'face' text property whose value is that face you defined, on the text
> > containing such a sequence.  This should force Emacs to use the font
> > you specified for that stretch of text, regardless of the fontset and
> > the default font.
> 
> But that would be a manual process? Or would it be displayed like that
> by default?

We could do something automatically using JIT font-lock mechanism,
perhaps.

> > Given that we don't really see why sequences with U+FE0E are needed,
> > perhaps requiring Lisp programs which do want to display such
> > sequences to use a special face is not such a big deal?
> 
> Ah, I see, something like
> 
> (require 'improve-emoji-display)
> 
> which does this magic defining such a face.

Yes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 29 Feb 2020 17:29:01 GMT) Full text and rfc822 format available.

Message #209 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 29 Feb 2020 19:27:44 +0200
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  39799 <at> debbugs.gnu.org
> Date: Sat, 29 Feb 2020 18:14:24 +0100
> 
> But currently I don’t seem to get a thin 1-pixel space for
> 
> Ⓜ︎ U+24C2 U+FE0E
> 
> I get a box with a 1 pixel border in the foreground colour (black) and
> this box has a total width of 6 pixels. See attached screenshot.
> 
> This looks fairly ugly is a bit too much maybe.
> 
> You get a 1-pixel space? Is special setup needed to get that or should
> I get that by default?

Sorry, I was thinking about U+200D.  You are right, the variation
selectors by default display as hex codes in a box.  But that can be
changed.  This is set up in lisp/international/characters.el, which
see.  And you can set it manually, of course, since it's just a
char-table.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Mon, 02 Mar 2020 09:11:02 GMT) Full text and rfc822 format available.

Message #212 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39799 <at> debbugs.gnu.org, Mike FABIAN <mfabian <at> redhat.com>
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Mon, 02 Mar 2020 10:10:14 +0100
>>>>> On Sat, 29 Feb 2020 19:27:44 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    Eli> Sorry, I was thinking about U+200D.  You are right, the variation
    Eli> selectors by default display as hex codes in a box.  But that can be
    Eli> changed.  This is set up in lisp/international/characters.el, which
    Eli> see.  And you can set it manually, of course, since it's just a
    Eli> char-table.

customizing glyphless-char-display-control is probably the easiest way
to do that.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Mon, 02 Mar 2020 11:03:02 GMT) Full text and rfc822 format available.

Message #215 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Mon, 02 Mar 2020 13:02:09 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Mike FABIAN <mfabian <at> redhat.com>,  39799 <at> debbugs.gnu.org
> Date: Mon, 02 Mar 2020 10:10:14 +0100
> 
> >>>>> On Sat, 29 Feb 2020 19:27:44 +0200, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> Sorry, I was thinking about U+200D.  You are right, the variation
>     Eli> selectors by default display as hex codes in a box.  But that can be
>     Eli> changed.  This is set up in lisp/international/characters.el, which
>     Eli> see.  And you can set it manually, of course, since it's just a
>     Eli> char-table.
> 
> customizing glyphless-char-display-control is probably the easiest way
> to do that.

Yes, but my point was that could be done by default.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Mon, 20 Sep 2021 20:39:01 GMT) Full text and rfc822 format available.

Message #218 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Mon, 20 Sep 2021 22:38:28 +0200
>>>>> On Fri, 28 Feb 2020 21:56:04 +0100, Robert Pluim <rpluim <at> gmail.com> said:

>>>>> On Fri, 28 Feb 2020 22:16:21 +0200, Eli Zaretskii <eliz <at> gnu.org> said:
    >>> One thing though: the code currently does set-char-table-range to a
    >>> new value. Is there a chance that an entry already exists in
    >>> composition-function-table for a particular character?

    Eli> Only if the non-leading character is a combining character, which I
    Eli> think is unlikely.  But in general, yes, this should be tested up
    Eli> front to avoid losing composition rules.

    Robert> OK, Iʼll take that into account.

    Robert> Robert

Iʼve just pushed a change to master that should fix (almost) all the
issues with displaying emoji sequences (except for keycaps). Feedback
welcome.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 09:17:02 GMT) Full text and rfc822 format available.

Message #221 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 12:16:38 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Mon, 20 Sep 2021 22:38:28 +0200
> 
> Iʼve just pushed a change to master that should fix (almost) all the
> issues with displaying emoji sequences (except for keycaps). Feedback
> welcome.

Thanks, this is mostly okay, IMO.  the only issue I have with this is
here:

> --- a/admin/unidata/blocks.awk
> +++ b/admin/unidata/blocks.awk
> @@ -221,6 +221,46 @@ FILENAME ~ "emoji-data.txt" && /^[0-9A-F].*; Emoji_Presentation / {
>  }
>  
>  END {
> +    ## These codepoints have Emoji_Presentation = No, but they are
> +    ## used in emoji-sequences.txt and emoji-zwj-sequences.txt (with a
> +    ## Variation Selector), so force them into the emoji script so
> +    ## they will get composed correctly.  FIXME: delete this when we
> +    ## can change the font used for a codepoint based on whether it's
> +    ## followed by a VS (usually VS-16)
> +    idx = 0
> +    override_start[idx] = "261D"
> +    override_end[idx] = "261D"
> +    idx++
> +    override_start[idx] = "26F9"
> +    override_end[idx] = "26F9"
> +    idx++
> +    override_start[idx] = "270C"
> +    override_end[idx] = "270D"
> +    idx++
> +    override_start[idx] = "2764"
> +    override_end[idx] = "2764"
> +    idx++
> +    override_start[idx] = "1F3CB"
> +    override_end[idx] = "1F3CC"
> +    idx++
> +    override_start[idx] = "1F3F3"
> +    override_end[idx] = "1F3F4"
> +    idx++
> +    override_start[idx] = "1F441"
> +    override_end[idx] = "1F441"
> +    idx++
> +    override_start[idx] = "1F575"
> +    override_end[idx] = "1F575"
> +
> +    for (k in override_start)
> +    {
> +        i++
> +        start[i] = override_start[k]
> +        end[i] = override_end[k]
> +        alt[i] = "emoji"
> +        name[i] = "Autogenerated emoji (override)"
> +    }

Specifically, the U+2xxx codepoints are now in the 'emoji' script,
which I think is undesirable, even if the price is that we won't
support the sequences in which those codepoints are followed by
VS-16.  So I think we should remove those codepoints from the above,
leaving only the U+1Fxxx" ones.

Btw, currently U+261D followed by VS-16 doesn't compose for me,
probably because compose-gstring-for-variation-glyph is hardcoded to
work only for Han characters, and U+261D isn't, or because that
function is not suited to VS-16 (it looks for glyph variations in the
font)?  Or am I missing something?

Now to my idea of supporting those "U+2xxx VS-16" sequences without
assigning them to the 'emoji' script:

The function autocmp_chars uses font_range to find whether the
sequence of characters that can be composed are supported by the same
font.  It currently takes the first character of the sequence, calls
font_for_char for it, then checks that all the rest of the characters
are supported by that font by calling font_encode_char.  In our case,
the first character of the sequence is U+2xxx, which is not in the
'emoji' script, so Emacs is likely to pick up a font that doesn't
support Emoji, and the composition will fail.  To avoid that, I
propose the following change:

  . add a new argument to font_range, the codepoint that triggered the
    composition
  . inside font_range, if that codepoint belongs to the 'emoji' script
    (use char-script-table to find that out), call font_for_char with
    a representative character for 'emoji' (from
    script-representative-chars) instead of the first character of the
    sequence, then check that all the sequence characters, including
    the first one, can be supported by that font; if they can, return
    that font to the caller, to be used for the composition

WDYT?

Btw, if you use Firefox or Chrome, or some other application that can
show Emoji sequences, or maybe just use HarfBuzz's hb-view, how does
the display of the U+2xxx changes when they are followed by VS-16?  Is
the change prominent enough for us to try to support it?  If not,
perhaps the above should be left out for the moment.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 10:35:02 GMT) Full text and rfc822 format available.

Message #224 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 12:34:45 +0200
[Message part 1 (text/plain, inline)]
>>>>> On Tue, 21 Sep 2021 12:16:38 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
    >> Date: Mon, 20 Sep 2021 22:38:28 +0200
    >> 
    >> Iʼve just pushed a change to master that should fix (almost) all the
    >> issues with displaying emoji sequences (except for keycaps). Feedback
    >> welcome.

    Eli> Thanks, this is mostly okay, IMO.  the only issue I have with this is
    Eli> here:

    Eli> Specifically, the U+2xxx codepoints are now in the 'emoji' script,
    Eli> which I think is undesirable, even if the price is that we won't
    Eli> support the sequences in which those codepoints are followed by
    Eli> VS-16.  So I think we should remove those codepoints from the above,
    Eli> leaving only the U+1Fxxx" ones.

OK, Iʼll adjust it.

    Eli> Btw, currently U+261D followed by VS-16 doesn't compose for me,
    Eli> probably because compose-gstring-for-variation-glyph is hardcoded to
    Eli> work only for Han characters, and U+261D isn't, or because that
    Eli> function is not suited to VS-16 (it looks for glyph variations in the
    Eli> font)?  Or am I missing something?

You mean it doesnʼt get treated as a composition, or the result looks
bad (despite the comments in compose-gstring-for-variation-glyph I
donʼt see it limiting things to Han anywhere)? I have the latter:

☝️

             position: 146 of 147 (99%), column: 0
            character: ☝ (displayed as ☝) (codepoint 9757, #o23035, #x261d)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x261D
               script: emoji
               syntax: w 	which means: word
             category: .:Base
             to input: type "C-x 8 RET 261d" or "C-x 8 RET WHITE UP POINTING INDEX"
          buffer code: #xE2 #x98 #x9D
            file code: #xE2 #x98 #x9D (encoded by coding system utf-8-unix)
              display: composed to form "☝️" (see below)

Composed with the following character(s) "️" using this font:
  ftcrhb:-GOOG-Noto Color Emoji-normal-normal-normal-*-19-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 9757 69 24 0 24 18 5 nil]
with these character(s):
  ️ (#xfe0f) VARIATION SELECTOR-16

Character code properties: customize what to show
  name: WHITE UP POINTING INDEX
  general-category: So (Symbol, Other)
  decomposition: (9757) ('☝')

There are text properties here:
  fontified            nil

    Eli> Now to my idea of supporting those "U+2xxx VS-16" sequences without
    Eli> assigning them to the 'emoji' script:

    Eli> The function autocmp_chars uses font_range to find whether the
    Eli> sequence of characters that can be composed are supported by the same
    Eli> font.  It currently takes the first character of the sequence, calls
    Eli> font_for_char for it, then checks that all the rest of the characters
    Eli> are supported by that font by calling font_encode_char.  In our case,
    Eli> the first character of the sequence is U+2xxx, which is not in the
    Eli> 'emoji' script, so Emacs is likely to pick up a font that doesn't
    Eli> support Emoji, and the composition will fail.  To avoid that, I
    Eli> propose the following change:

    Eli>   . add a new argument to font_range, the codepoint that triggered the
    Eli>     composition
    Eli>   . inside font_range, if that codepoint belongs to the 'emoji' script
    Eli>     (use char-script-table to find that out), call font_for_char with
    Eli>     a representative character for 'emoji' (from
    Eli>     script-representative-chars) instead of the first character of the
    Eli>     sequence, then check that all the sequence characters, including
    Eli>     the first one, can be supported by that font; if they can, return
    Eli>     that font to the caller, to be used for the composition

    Eli> WDYT?

I think this means you'd have to add the Variation Selectors to the
emoji script, but it should work. Iʼm not sure that *all* the
characters need to be supported by the font: if thereʼs a ZWJ in
there, itʼs purely functional, so thereʼs no need for a glyph for it
(and Iʼm hoping harfbuzz agrees), but thatʼs a moot point for U+2xxx U+FE0F

    Eli> Btw, if you use Firefox or Chrome, or some other application that can
    Eli> show Emoji sequences, or maybe just use HarfBuzz's hb-view, how does
    Eli> the display of the U+2xxx changes when they are followed by VS-16?  Is
    Eli> the change prominent enough for us to try to support it?  If not,
    Eli> perhaps the above should be left out for the moment.

At least with chromium, the glyph becomes more colourful for about a
dozen codepoints, but not for U+261D (see attached). The VS-16 itself
is hidden.

Robert
--

[Screenshot from 2021-09-21 12-26-36.png (image/png, attachment)]
[Screenshot from 2021-09-21 12-26-11.png (image/png, attachment)]
[Screenshot from 2021-09-21 12-25-39.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 10:55:02 GMT) Full text and rfc822 format available.

Message #227 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 13:54:02 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Tue, 21 Sep 2021 12:34:45 +0200
> 
>     Eli> Btw, currently U+261D followed by VS-16 doesn't compose for me,
>     Eli> probably because compose-gstring-for-variation-glyph is hardcoded to
>     Eli> work only for Han characters, and U+261D isn't, or because that
>     Eli> function is not suited to VS-16 (it looks for glyph variations in the
>     Eli> font)?  Or am I missing something?
> 
> You mean it doesnʼt get treated as a composition, or the result looks
> bad

The former.

> Composed with the following character(s) "️" using this font:
>   ftcrhb:-GOOG-Noto Color Emoji-normal-normal-normal-*-19-*-*-*-m-0-iso10646-1
> by these glyphs:
>   [0 1 9757 69 24 0 24 18 5 nil]
> with these character(s):
>   ️ (#xfe0f) VARIATION SELECTOR-16

I guess it's the font, then.  But anyway, it's confusing to have the
composition function in japanese.el, we should move it to composite.el
instead, I think.

> I think this means you'd have to add the Variation Selectors to the
> emoji script

Yes, of course.

> Iʼm not sure that *all* the
> characters need to be supported by the font: if thereʼs a ZWJ in
> there, itʼs purely functional, so thereʼs no need for a glyph for it
> (and Iʼm hoping harfbuzz agrees)

This is already handled in font_range:

  while (pos < *limit)
    {
      c = (NILP (string)
	   ? fetch_char_advance_no_check (&pos, &pos_byte)
	   : fetch_string_char_advance_no_check (string, &pos, &pos_byte));
      Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
      if (FIXNUMP (category)
	  && (XFIXNUM (category) == UNICODE_CATEGORY_Cf  <<<<<<<<<<<<<<<<<<<<
	      || CHAR_VARIATION_SELECTOR_P (c)))
	continue;

>     Eli> Btw, if you use Firefox or Chrome, or some other application that can
>     Eli> show Emoji sequences, or maybe just use HarfBuzz's hb-view, how does
>     Eli> the display of the U+2xxx changes when they are followed by VS-16?  Is
>     Eli> the change prominent enough for us to try to support it?  If not,
>     Eli> perhaps the above should be left out for the moment.
> 
> At least with chromium, the glyph becomes more colourful for about a
> dozen codepoints, but not for U+261D (see attached).

So it _is_ worth supporting.  Would you please make those changes in
font_range and in blocks.awk?

> The VS-16 itself is hidden.

If the composition succeeds, it will be hidden in Emacs as well.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 11:32:02 GMT) Full text and rfc822 format available.

Message #230 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rpluim <at> gmail.com
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50;
 Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 14:31:10 +0300
> Date: Tue, 21 Sep 2021 13:54:02 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
> 
> > I think this means you'd have to add the Variation Selectors to the
> > emoji script
> 
> Yes, of course.

Btw, we could recognize VS-16 explicitly in font_range, and avoid
putting them into the 'emoji' script.  But that would be too kludgey,
I guess.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 11:49:02 GMT) Full text and rfc822 format available.

Message #233 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 13:48:17 +0200
[Message part 1 (text/plain, inline)]
Robert Pluim <rpluim <at> gmail.com> さんはかきました:

>>>>>> On Fri, 28 Feb 2020 21:56:04 +0100, Robert Pluim <rpluim <at> gmail.com> said:
>
>>>>>> On Fri, 28 Feb 2020 22:16:21 +0200, Eli Zaretskii <eliz <at> gnu.org> said:
>     >>> One thing though: the code currently does set-char-table-range to a
>     >>> new value. Is there a chance that an entry already exists in
>     >>> composition-function-table for a particular character?
>
>     Eli> Only if the non-leading character is a combining character, which I
>     Eli> think is unlikely.  But in general, yes, this should be tested up
>     Eli> front to avoid losing composition rules.
>
>     Robert> OK, Iʼll take that into account.
>
>     Robert> Robert
>
> Iʼve just pushed a change to master that should fix (almost) all the
> issues with displaying emoji sequences (except for keycaps). Feedback
> welcome.

Should that also fix the skin tones?
Because '👩🏽' (U+1F469 U+1F3FD) still displays as two characters for me although
the cursor moves over it as one character.

The font used seems to be JoyPixels:

                 position: 1928 of 2088 (92%), restriction: <669-2089>, column: 1
                character: 👩 (displayed as 👩) (codepoint 128105, #o372151, #x1f469)
                  charset: unicode (Unicode (ISO10646))
    code point in charset: 0x1F469
                   script: emoji
                   syntax: w 	which means: word
                 category: .:Base
                 to input: type "C-x 8 RET 1f469" or "C-x 8 RET WOMAN"
              buffer code: #xF0 #x9F #x91 #xA9
                file code: #xF0 #x9F #x91 #xA9 (encoded by coding system utf-8-emacs)
                  display: composed to form "👩🏽" (see below)
    
    Composed with the following character(s) "🏽" using this font:
      ftcrhb:-GOOG-JoyPixels-normal-normal-normal-*-16-*-*-*-*-0-iso10646-1
    by these glyphs:
      [0 1 128105 2168 19 0 19 15 4 nil]
    with these character(s):
      🏽 (#x1f3fd) EMOJI MODIFIER FITZPATRICK TYPE-4
    
    Character code properties: customize what to show
      name: WOMAN
      general-category: So (Symbol, Other)
      decomposition: (128105) ('👩')
    
    There are text properties here:
      fontified            t

Also from the way it looks it seems to be JoyPixels.

And that font seems to support the skintones in other programs like
emoji-picker:

[Screenshot.png (image/png, attachment)]
[Message part 3 (text/plain, inline)]
-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 11:59:01 GMT) Full text and rfc822 format available.

Message #236 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 14:58:35 +0300
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Tue, 21 Sep 2021 13:48:17 +0200
> 
> > Iʼve just pushed a change to master that should fix (almost) all the
> > issues with displaying emoji sequences (except for keycaps). Feedback
> > welcome.
> 
> Should that also fix the skin tones?

It should, and I thought HarfBuzz on Cairo already supported that?

Can you try this with hb-view and see if HarfBuzz produces a single
glyph/grapheme from this sequence?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 12:28:02 GMT) Full text and rfc822 format available.

Message #239 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 14:27:39 +0200
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
>> Date: Tue, 21 Sep 2021 13:48:17 +0200
>> 
>> > Iʼve just pushed a change to master that should fix (almost) all the
>> > issues with displaying emoji sequences (except for keycaps). Feedback
>> > welcome.
>> 
>> Should that also fix the skin tones?
>
> It should, and I thought HarfBuzz on Cairo already supported that?

Yes, and I think my screenshot shows that it does because my Screenshot
uses Pango (and the rest of the rendering stack including HarfBuzz and
Cairo). 

> Can you try this with hb-view and see if HarfBuzz produces a single
> glyph/grapheme from this sequence?

$ hb-view --annotate --font-file=/home/mfabian/.fonts/joypixels-6.6/android/joypixels-android.
ttf --font-size=50 --text="👩🏽"

looks like:

[Screenshot.png (image/png, attachment)]
[Message part 3 (text/plain, inline)]

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 12:38:02 GMT) Full text and rfc822 format available.

Message #242 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 15:37:45 +0300
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Tue, 21 Sep 2021 14:27:39 +0200
> 
> >> Should that also fix the skin tones?
> >
> > It should, and I thought HarfBuzz on Cairo already supported that?
> 
> Yes, and I think my screenshot shows that it does because my Screenshot
> uses Pango (and the rest of the rendering stack including HarfBuzz and
> Cairo). 

Now I'm confused: what do you mean here by "it does"?  Does Emacs
support that, or does some other program support it?  If Emacs, then
why did you just tell there was a problem?

When I said "I thought HarfBuzz on Cairo already supported that", I
meant Emacs that uses HarfBuzz on Cairo.  I'm pretty sure we do
support color Emoji in that configuration.

> > Can you try this with hb-view and see if HarfBuzz produces a single
> > glyph/grapheme from this sequence?
> 
> $ hb-view --annotate --font-file=/home/mfabian/.fonts/joypixels-6.6/android/joypixels-android.
> ttf --font-size=50 --text="👩🏽"
> 
> looks like:

The image looks partial and pixelated.  Can you produce PNG or JPEG
or some other color image file, and attach it?

Anyway, if hb-view produces a single glyph, then I guess we need to
debug ftcrfont.c and/or hbfont.c to see why we we produce 2 glyphs in
that case.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 12:51:02 GMT) Full text and rfc822 format available.

Message #245 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, Mike FABIAN <mfabian <at> redhat.com>
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 14:50:22 +0200
>>>>> On Tue, 21 Sep 2021 14:58:35 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Mike FABIAN <mfabian <at> redhat.com>
    >> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
    >> Date: Tue, 21 Sep 2021 13:48:17 +0200
    >> 
    >> > Iʼve just pushed a change to master that should fix (almost) all the
    >> > issues with displaying emoji sequences (except for keycaps). Feedback
    >> > welcome.
    >> 
    >> Should that also fix the skin tones?

    Eli> It should, and I thought HarfBuzz on Cairo already supported that?

It does, but with 'Noto Color Emoji' it doesnʼt work for all
codepoints.

    Eli> Can you try this with hb-view and see if HarfBuzz produces a single
    Eli> glyph/grapheme from this sequence?

For reference, since the documentation of hb-view is sadly lacking:

hb-view  --output-file=foo.svg --font-size=14 \
/usr/share/fonts/truetype/noto/NotoColorEmoji.ttf  -u 1f469

produces a glyph with no skin tone. And

hb-view  --output-file=foo.svg --font-size=14 \
/usr/share/fonts/truetype/noto/NotoColorEmoji.ttf  -u 1f469,1f3fd

Produces one with a medium tone. The skin tones are working for other
code points, so I donʼt know what's causing this particular problem.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 13:08:01 GMT) Full text and rfc822 format available.

Message #248 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 16:06:53 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Mike FABIAN <mfabian <at> redhat.com>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Tue, 21 Sep 2021 14:50:22 +0200
> 
>     >> Should that also fix the skin tones?
> 
>     Eli> It should, and I thought HarfBuzz on Cairo already supported that?
> 
> It does, but with 'Noto Color Emoji' it doesnʼt work for all
> codepoints.
> 
>     Eli> Can you try this with hb-view and see if HarfBuzz produces a single
>     Eli> glyph/grapheme from this sequence?
> 
> For reference, since the documentation of hb-view is sadly lacking:
> 
> hb-view  --output-file=foo.svg --font-size=14 \
> /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf  -u 1f469
> 
> produces a glyph with no skin tone. And
> 
> hb-view  --output-file=foo.svg --font-size=14 \
> /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf  -u 1f469,1f3fd
> 
> Produces one with a medium tone. The skin tones are working for other
> code points, so I donʼt know what's causing this particular problem.

If it works with hb-view, but not in Emacs, it's our problem.  If it
doesn't work in hb-view as well, perhaps we should ask the HarfBuzz
developers what they have to say about this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 13:27:02 GMT) Full text and rfc822 format available.

Message #251 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, Robert Pluim <rpluim <at> gmail.com>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 15:25:46 +0200
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: Mike FABIAN <mfabian <at> redhat.com>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
>> Date: Tue, 21 Sep 2021 14:50:22 +0200
>> 
>>     >> Should that also fix the skin tones?
>> 
>>     Eli> It should, and I thought HarfBuzz on Cairo already supported that?
>> 
>> It does, but with 'Noto Color Emoji' it doesnʼt work for all
>> codepoints.
>> 
>>     Eli> Can you try this with hb-view and see if HarfBuzz produces a single
>>     Eli> glyph/grapheme from this sequence?
>> 
>> For reference, since the documentation of hb-view is sadly lacking:
>> 
>> hb-view  --output-file=foo.svg --font-size=14 \
>> /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf  -u 1f469
>> 
>> produces a glyph with no skin tone. And
>> 
>> hb-view  --output-file=foo.svg --font-size=14 \
>> /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf  -u 1f469,1f3fd
>> 
>> Produces one with a medium tone. The skin tones are working for other
>> code points, so I donʼt know what's causing this particular problem.
>
> If it works with hb-view, but not in Emacs, it's our problem.  If it
> doesn't work in hb-view as well, perhaps we should ask the HarfBuzz
> developers what they have to say about this.

It does work with hb-view.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 13:55:02 GMT) Full text and rfc822 format available.

Message #254 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 15:53:52 +0200
>>>>> On Tue, 21 Sep 2021 15:25:46 +0200, Mike FABIAN <mfabian <at> redhat.com> said:
    >> If it works with hb-view, but not in Emacs, it's our problem.  If it
    >> doesn't work in hb-view as well, perhaps we should ask the HarfBuzz
    >> developers what they have to say about this.

    Mike> It does work with hb-view.

Itʼs a problem with the way we generate the auto composition
sequences. If I remove the ZWJ sequences for eg 1f469, then all the
skin tone sequences for 1f469 work (and the generating script has an
embarassing but harmless typo, which Iʼll fix along the way)

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 14:20:02 GMT) Full text and rfc822 format available.

Message #257 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 17:19:23 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Tue, 21 Sep 2021 15:53:52 +0200
> 
>     Mike> It does work with hb-view.
> 
> Itʼs a problem with the way we generate the auto composition
> sequences. If I remove the ZWJ sequences for eg 1f469, then all the
> skin tone sequences for 1f469 work

Not sure I understand.  The sequence U+1F4F9,U+1F3FD indeed does not
appear in emoji-zwj.el, but it does appear in emoji-sequences.txt.
However, the string "👩🏽" doesn't match the regexp in the
composition-function-table's slot for U+1F4F9.  Why is this?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 14:44:02 GMT) Full text and rfc822 format available.

Message #260 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 16:43:17 +0200
>>>>> On Tue, 21 Sep 2021 17:19:23 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
    >> Date: Tue, 21 Sep 2021 15:53:52 +0200
    >> 
    Mike> It does work with hb-view.
    >> 
    >> Itʼs a problem with the way we generate the auto composition
    >> sequences. If I remove the ZWJ sequences for eg 1f469, then all the
    >> skin tone sequences for 1f469 work

    Eli> Not sure I understand.  The sequence U+1F4F9,U+1F3FD indeed does not
    Eli> appear in emoji-zwj.el, but it does appear in emoji-sequences.txt.
    Eli> However, the string "👩🏽" doesn't match the regexp in the
    Eli> composition-function-table's slot for U+1F4F9.  Why is this?

Because for skin tones we index on the modifier, and use lookback:

;; Skin tones
(set-char-table-range composition-function-table
                      '(#x1F3FB . #x1F3FF)
                      (nconc (char-table-range composition-function-table '(#x1F3FB . #x1F3FF))
                             (list (vector ".[\U0001F3FB-\U0001F3FF]"
                                           1
                                    'compose-gstring-for-graphic))))

Iʼve just tried adding "\N{U+1F469}\N{U+1F3FE}" to the composition
function table regexp for U+1F469 manually, and now I get correct
composition. That means we could process the
RGI_Emoji_Modifier_Sequence entries from emoji-sequences.txt with
emoji-zwj.awk and add them, indexed on the base character (and remove
the above code).

Iʼd still like to understand where things are going wrong though.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 15:59:01 GMT) Full text and rfc822 format available.

Message #263 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 18:58:38 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: mfabian <at> redhat.com,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Tue, 21 Sep 2021 16:43:17 +0200
> 
>     Eli> Not sure I understand.  The sequence U+1F4F9,U+1F3FD indeed does not
>     Eli> appear in emoji-zwj.el, but it does appear in emoji-sequences.txt.
>     Eli> However, the string "👩🏽" doesn't match the regexp in the
>     Eli> composition-function-table's slot for U+1F4F9.  Why is this?
> 
> Because for skin tones we index on the modifier, and use lookback:
> 
> ;; Skin tones
> (set-char-table-range composition-function-table
>                       '(#x1F3FB . #x1F3FF)
>                       (nconc (char-table-range composition-function-table '(#x1F3FB . #x1F3FF))
>                              (list (vector ".[\U0001F3FB-\U0001F3FF]"
>                                            1
>                                     'compose-gstring-for-graphic))))

Ah, okay.  But why isn't that working?

> Iʼve just tried adding "\N{U+1F469}\N{U+1F3FE}" to the composition
> function table regexp for U+1F469 manually, and now I get correct
> composition. That means we could process the
> RGI_Emoji_Modifier_Sequence entries from emoji-sequences.txt with
> emoji-zwj.awk and add them, indexed on the base character (and remove
> the above code).
> 
> Iʼd still like to understand where things are going wrong though.

Are you debugging this, or would you like me to take a look?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 16:11:02 GMT) Full text and rfc822 format available.

Message #266 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 18:10:37 +0200
>>>>> On Tue, 21 Sep 2021 18:58:38 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: mfabian <at> redhat.com,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
    >> Date: Tue, 21 Sep 2021 16:43:17 +0200
    >> 
    Eli> Not sure I understand.  The sequence U+1F4F9,U+1F3FD indeed does not
    Eli> appear in emoji-zwj.el, but it does appear in emoji-sequences.txt.
    Eli> However, the string "👩🏽" doesn't match the regexp in the
    Eli> composition-function-table's slot for U+1F4F9.  Why is this?
    >> 
    >> Because for skin tones we index on the modifier, and use lookback:
    >> 
    >> ;; Skin tones
    >> (set-char-table-range composition-function-table
    >> '(#x1F3FB . #x1F3FF)
    >> (nconc (char-table-range composition-function-table '(#x1F3FB . #x1F3FF))
    >> (list (vector ".[\U0001F3FB-\U0001F3FF]"
    >> 1
    >> 'compose-gstring-for-graphic))))

    Eli> Ah, okay.  But why isn't that working?

I have no idea. Even a single entry for U+1F469 in
composition-function-table in emoji-zwj.el messes things up.

    >> Iʼve just tried adding "\N{U+1F469}\N{U+1F3FE}" to the composition
    >> function table regexp for U+1F469 manually, and now I get correct
    >> composition. That means we could process the
    >> RGI_Emoji_Modifier_Sequence entries from emoji-sequences.txt with
    >> emoji-zwj.awk and add them, indexed on the base character (and remove
    >> the above code).
    >>

This turns out to be a pretty small change, so if we donʼt get to the
bottom of it we have an alternative.

    >> Iʼd still like to understand where things are going wrong though.

    Eli> Are you debugging this, or would you like me to take a look?

Iʼd appreciate it if you have time. Itʼs not code Iʼm very familiar
with (and someone asked me to implement VS-16 based composition, so
Iʼm busy :-) )

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 16:24:01 GMT) Full text and rfc822 format available.

Message #269 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 19:23:41 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: mfabian <at> redhat.com,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Tue, 21 Sep 2021 18:10:37 +0200
> 
> >>>>> On Tue, 21 Sep 2021 18:58:38 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     >> Because for skin tones we index on the modifier, and use lookback:
>     >> 
>     >> ;; Skin tones
>     >> (set-char-table-range composition-function-table
>     >> '(#x1F3FB . #x1F3FF)
>     >> (nconc (char-table-range composition-function-table '(#x1F3FB . #x1F3FF))
>     >> (list (vector ".[\U0001F3FB-\U0001F3FF]"
>     >> 1
>     >> 'compose-gstring-for-graphic))))
> 
>     Eli> Ah, okay.  But why isn't that working?
> 
> I have no idea. Even a single entry for U+1F469 in
> composition-function-table in emoji-zwj.el messes things up.

This rang a bell, so I looked around.  And sure enough, there's this
subtlety documented in the doc string of composition-function-table:

  The element at index C in the table, if non-nil, is a list of
  composition rules of the form ([PATTERN PREV-CHARS FUNC] ...);
  the rules must be specified in the descending order of PREV-CHARS
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  values.
  ^^^^^^

(I could find the code which enforces this, if necessary, but I
clearly remember bumping into this in misc-lang.el, with Arabic
composition rules, which is when I added the above to documentation.)

And emoji-zwj.el doesn't adhere to this condition.  If you reorder the
rules as required above, does the problem go away (I cannot test this
myself, as I don't have access to a system where color Emoji work in
Emacs)?

>     Eli> Are you debugging this, or would you like me to take a look?
> 
> Iʼd appreciate it if you have time. Itʼs not code Iʼm very familiar
> with (and someone asked me to implement VS-16 based composition, so
> Iʼm busy :-) )

If the above doesn't work, I will dig more.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 16:51:01 GMT) Full text and rfc822 format available.

Message #272 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rpluim <at> gmail.com
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50;
 Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 19:50:00 +0300
> Date: Tue, 21 Sep 2021 19:23:41 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
> 
>   The element at index C in the table, if non-nil, is a list of
>   composition rules of the form ([PATTERN PREV-CHARS FUNC] ...);
>   the rules must be specified in the descending order of PREV-CHARS
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   values.
>   ^^^^^^
> 
> (I could find the code which enforces this, if necessary, but I
> clearly remember bumping into this in misc-lang.el, with Arabic
> composition rules, which is when I added the above to documentation.)
> 
> And emoji-zwj.el doesn't adhere to this condition.  If you reorder the
> rules as required above, does the problem go away (I cannot test this
> myself, as I don't have access to a system where color Emoji work in
> Emacs)?

Actually, ignore me: the above is for rules for the same character,
whereas emoji-zwj.el doesn't have such rules.

I will take a look at the code.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 17:44:02 GMT) Full text and rfc822 format available.

Message #275 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 19:43:05 +0200
>>>>> On Tue, 21 Sep 2021 14:31:10 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Date: Tue, 21 Sep 2021 13:54:02 +0300
    >> From: Eli Zaretskii <eliz <at> gnu.org>
    >> Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
    >> 
    >> > I think this means you'd have to add the Variation Selectors to the
    >> > emoji script
    >> 
    >> Yes, of course.

    Eli> Btw, we could recognize VS-16 explicitly in font_range, and avoid
    Eli> putting them into the 'emoji' script.  But that would be too kludgey,
    Eli> I guess.

FE0F is already in script-representative-chars for emoji :-)

Eli, is this the kind of thing you were thinking of? Seems to work so
far (with a small addition to blocks.awk).

We'll need to find a better name for the new arg than 'trigger'
though.

diff --git a/src/composite.c b/src/composite.c
index e97f8e2b4c..85485e9358 100644
--- a/src/composite.c
+++ b/src/composite.c
@@ -882,14 +882,15 @@ fill_gstring_body (Lisp_Object gstring)
 /* Try to compose the characters at CHARPOS according to composition
    rule RULE ([PATTERN PREV-CHARS FUNC]).  LIMIT limits the characters
    to compose.  STRING, if not nil, is a target string.  WIN is a
-   window where the characters are being displayed.  If characters are
+   window where the characters are being displayed.  TRIGGER is the
+   character that triggered the composition check.  If characters are
    successfully composed, return the composition as a glyph-string
    object.  Otherwise return nil.  */
 
 static Lisp_Object
 autocmp_chars (Lisp_Object rule, ptrdiff_t charpos, ptrdiff_t bytepos,
 	       ptrdiff_t limit, struct window *win, struct face *face,
-	       Lisp_Object string, Lisp_Object direction)
+	       Lisp_Object string, Lisp_Object direction, int trigger)
 {
   ptrdiff_t count = SPECPDL_INDEX ();
   Lisp_Object pos = make_fixnum (charpos);
@@ -920,7 +921,7 @@ autocmp_chars (Lisp_Object rule, ptrdiff_t charpos, ptrdiff_t bytepos,
   struct frame *f = XFRAME (font_object);
   if (FRAME_WINDOW_P (f))
     {
-      font_object = font_range (charpos, bytepos, &to, win, face, string);
+      font_object = font_range (charpos, bytepos, &to, win, face, string, trigger);
       if (! FONT_OBJECT_P (font_object)
 	  || (! NILP (re)
 	      && to < limit
@@ -1269,7 +1270,7 @@ composition_reseat_it (struct composition_it *cmp_it, ptrdiff_t charpos,
 	      if (XFIXNAT (AREF (elt, 1)) != cmp_it->lookback)
 		goto no_composition;
 	      lgstring = autocmp_chars (elt, charpos, bytepos, endpos,
-					w, face, string, direction);
+					w, face, string, direction, cmp_it->ch);
 	      if (composition_gstring_p (lgstring))
 		break;
 	      lgstring = Qnil;
@@ -1307,7 +1308,7 @@ composition_reseat_it (struct composition_it *cmp_it, ptrdiff_t charpos,
 	  else
 	    direction = QR2L;
 	  lgstring = autocmp_chars (elt, cpos, bpos, charpos + 1, w, face,
-				    string, direction);
+				    string, direction, cmp_it->ch);
 	  if (! composition_gstring_p (lgstring)
 	      || cpos + LGSTRING_CHAR_LEN (lgstring) - 1 != charpos)
 	    /* Composition failed or didn't cover the current
@@ -1676,7 +1677,7 @@ find_automatic_composition (ptrdiff_t pos, ptrdiff_t limit, ptrdiff_t backlim,
 		  for (check = cur; check_pos < check.pos; )
 		    BACKWARD_CHAR (check, stop);
 		  *gstring = autocmp_chars (elt, check.pos, check.pos_byte,
-					    tail, w, NULL, string, Qnil);
+					    tail, w, NULL, string, Qnil, c);
 		  need_adjustment = 1;
 		  if (NILP (*gstring))
 		    {
diff --git a/src/font.c b/src/font.c
index e043ef8d01..74a1214b38 100644
--- a/src/font.c
+++ b/src/font.c
@@ -3866,6 +3866,9 @@ font_at (int c, ptrdiff_t pos, struct face *face, struct window *w,
    If STRING is not nil, it is the string to check instead of the current
    buffer.  In that case, FACE must be not NULL.
 
+   TRIGGER is the character that actually caused the composition
+   process to start, it may be different from the character at POS.
+
    The return value is the font-object for the character at POS.
    *LIMIT is set to the position where that font can't be used.
 
@@ -3873,15 +3876,16 @@ font_at (int c, ptrdiff_t pos, struct face *face, struct window *w,
 
 Lisp_Object
 font_range (ptrdiff_t pos, ptrdiff_t pos_byte, ptrdiff_t *limit,
-	    struct window *w, struct face *face, Lisp_Object string)
+	    struct window *w, struct face *face, Lisp_Object string,
+	    int trigger)
 {
   ptrdiff_t ignore;
   int c;
   Lisp_Object font_object = Qnil;
+  struct frame *f = XFRAME (w->frame);
 
   if (!face)
     {
-      struct frame *f = XFRAME (w->frame);
       int face_id;
 
       if (NILP (string))
@@ -3912,6 +3916,23 @@ font_range (ptrdiff_t pos, ptrdiff_t pos_byte, ptrdiff_t *limit,
 	continue;
       if (NILP (font_object))
 	{
+	  if (EQ (CHAR_TABLE_REF (Vchar_script_table, trigger),
+		  Qemoji))
+	    {
+	      Lisp_Object val = assq_no_quit (Qemoji, Vscript_representative_chars);
+	      if (CONSP (val))
+		{
+		  int face_id;
+		  val = XCDR (val);
+		  if (CONSP (val))
+		    val = XCAR (val);
+		  else if (VECTORP (val))
+		    val = AREF (val, 0);
+		  c = XFIXNAT (val);
+		  face_id = FACE_FOR_CHAR (f, face, c, pos - 1, string);
+		  face = FACE_FROM_ID (f, face_id);
+		}
+	    }
 	  font_object = font_for_char (face, c, pos - 1, string);
 	  if (NILP (font_object))
 	    return Qnil;
@@ -5423,6 +5444,7 @@ syms_of_font (void)
   DEFSYM (Qiso8859_1, "iso8859-1");
   DEFSYM (Qiso10646_1, "iso10646-1");
   DEFSYM (Qunicode_bmp, "unicode-bmp");
+  DEFSYM (Qemoji, "emoji");
 
   /* Symbols representing keys of font extra info.  */
   DEFSYM (QCotf, ":otf");
diff --git a/src/font.h b/src/font.h
index d3e1530642..1da72cca07 100644
--- a/src/font.h
+++ b/src/font.h
@@ -885,7 +885,7 @@ valid_font_driver (struct font_driver const *d)
 extern Lisp_Object font_update_drivers (struct frame *f, Lisp_Object list);
 extern Lisp_Object font_range (ptrdiff_t, ptrdiff_t, ptrdiff_t *,
 			       struct window *, struct face *,
-			       Lisp_Object);
+			       Lisp_Object, int);
 extern void font_fill_lglyph_metrics (Lisp_Object, struct font *, unsigned int);
 
 extern Lisp_Object font_put_extra (Lisp_Object font, Lisp_Object prop,




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 18:21:02 GMT) Full text and rfc822 format available.

Message #278 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rpluim <at> gmail.com
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50;
 Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 21:20:07 +0300
> Date: Tue, 21 Sep 2021 19:50:00 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
> 
> I will take a look at the code.

AFAICT, the composition machinery doesn't allow to have a rule with
non-zero PREV-CHARS where PREV-CHARS are supposed to match characters
that have their own rules in composition-function-table.  Such rules
are never considered, because once the rules for a character are
processed, we never reconsider what to do with that character.  I will
ask Kenichi Handa whether my conclusion is indeed correct, but for
now, I think we should take the fire escape you mentioned earlier:

>     >> Iʼve just tried adding "\N{U+1F469}\N{U+1F3FE}" to the composition
>     >> function table regexp for U+1F469 manually, and now I get correct
>     >> composition. That means we could process the
>     >> RGI_Emoji_Modifier_Sequence entries from emoji-sequences.txt with
>     >> emoji-zwj.awk and add them, indexed on the base character (and remove
>     >> the above code).
>     >>
> 
> This turns out to be a pretty small change, so if we donʼt get to the
> bottom of it we have an alternative.

AFAIU, this means we will add 1F3FB..1F3FF to the characters that can
follow each of those which have rules with zero lookback.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Tue, 21 Sep 2021 18:29:01 GMT) Full text and rfc822 format available.

Message #281 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 21:28:40 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Tue, 21 Sep 2021 19:43:05 +0200
> 
> Eli, is this the kind of thing you were thinking of? Seems to work so
> far (with a small addition to blocks.awk).

Yes, with a minor comment below.

> We'll need to find a better name for the new arg than 'trigger'
> though.

How about just 'ch'?  We use such names all over the place, so
describing what it is in the comment should be enough.

> @@ -3912,6 +3916,23 @@ font_range (ptrdiff_t pos, ptrdiff_t pos_byte, ptrdiff_t *limit,
>  	continue;
>        if (NILP (font_object))
>  	{
> +	  if (EQ (CHAR_TABLE_REF (Vchar_script_table, trigger),
> +		  Qemoji))
> +	    {
> +	      Lisp_Object val = assq_no_quit (Qemoji, Vscript_representative_chars);
> +	      if (CONSP (val))
> +		{
> +		  int face_id;
> +		  val = XCDR (val);
> +		  if (CONSP (val))
> +		    val = XCAR (val);
> +		  else if (VECTORP (val))
> +		    val = AREF (val, 0);
> +		  c = XFIXNAT (val);
> +		  face_id = FACE_FOR_CHAR (f, face, c, pos - 1, string);
> +		  face = FACE_FROM_ID (f, face_id);
> +		}
> +	    }
>  	  font_object = font_for_char (face, c, pos - 1, string);
>  	  if (NILP (font_object))
>  	    return Qnil;

For backward compatibility, I'd prefer here, if font_for_char returns
nil for the representative Emoji character, to call font_for_char
again with the face and codepoint for the original character.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Wed, 22 Sep 2021 09:00:01 GMT) Full text and rfc822 format available.

Message #284 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Wed, 22 Sep 2021 10:59:00 +0200
>>>>> On Tue, 21 Sep 2021 21:20:07 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Date: Tue, 21 Sep 2021 19:50:00 +0300
    >> From: Eli Zaretskii <eliz <at> gnu.org>
    >> Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
    >> 
    >> I will take a look at the code.

    Eli> AFAICT, the composition machinery doesn't allow to have a rule with
    Eli> non-zero PREV-CHARS where PREV-CHARS are supposed to match characters
    Eli> that have their own rules in composition-function-table.  Such rules
    Eli> are never considered, because once the rules for a character are
    Eli> processed, we never reconsider what to do with that character.  I will
    Eli> ask Kenichi Handa whether my conclusion is indeed correct, but for
    Eli> now, I think we should take the fire escape you mentioned earlier:

OK. Those are surprising semantics. If it does turn out to be like
that, we should document it.

    >> >> Iʼve just tried adding "\N{U+1F469}\N{U+1F3FE}" to the composition
    >> >> function table regexp for U+1F469 manually, and now I get correct
    >> >> composition. That means we could process the
    >> >> RGI_Emoji_Modifier_Sequence entries from emoji-sequences.txt with
    >> >> emoji-zwj.awk and add them, indexed on the base character (and remove
    >> >> the above code).
    >> >>
    >> 
    >> This turns out to be a pretty small change, so if we donʼt get to the
    >> bottom of it we have an alternative.

    Eli> AFAIU, this means we will add 1F3FB..1F3FF to the characters that can
    Eli> follow each of those which have rules with zero lookback.

Yes. I won't get to that today though.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Wed, 22 Sep 2021 09:03:01 GMT) Full text and rfc822 format available.

Message #287 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Wed, 22 Sep 2021 11:02:04 +0200
>>>>> On Tue, 21 Sep 2021 21:28:40 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
    >> Date: Tue, 21 Sep 2021 19:43:05 +0200
    >> 
    >> Eli, is this the kind of thing you were thinking of? Seems to work so
    >> far (with a small addition to blocks.awk).

    Eli> Yes, with a minor comment below.

    >> We'll need to find a better name for the new arg than 'trigger'
    >> though.

    Eli> How about just 'ch'?  We use such names all over the place, so
    Eli> describing what it is in the comment should be enough.

OK

    >> @@ -3912,6 +3916,23 @@ font_range (ptrdiff_t pos, ptrdiff_t pos_byte, ptrdiff_t *limit,
    >> continue;
    >> if (NILP (font_object))
    >> {
    >> +	  if (EQ (CHAR_TABLE_REF (Vchar_script_table, trigger),
    >> +		  Qemoji))
    >> +	    {
    >> +	      Lisp_Object val = assq_no_quit (Qemoji, Vscript_representative_chars);
    >> +	      if (CONSP (val))
    >> +		{
    >> +		  int face_id;
    >> +		  val = XCDR (val);
    >> +		  if (CONSP (val))
    >> +		    val = XCAR (val);
    >> +		  else if (VECTORP (val))
    >> +		    val = AREF (val, 0);
    >> +		  c = XFIXNAT (val);
    >> +		  face_id = FACE_FOR_CHAR (f, face, c, pos - 1, string);
    >> +		  face = FACE_FROM_ID (f, face_id);
    >> +		}
    >> +	    }
    >> font_object = font_for_char (face, c, pos - 1, string);
    >> if (NILP (font_object))
    >> return Qnil;

    Eli> For backward compatibility, I'd prefer here, if font_for_char returns
    Eli> nil for the representative Emoji character, to call font_for_char
    Eli> again with the face and codepoint for the original character.

OK. I spoke a little too soon, itʼs not working 100% reliably,
sometimes the composition happens but the glyph for the preceding
codepoint is not from the emoji font.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Wed, 22 Sep 2021 13:48:02 GMT) Full text and rfc822 format available.

Message #290 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Wed, 22 Sep 2021 16:47:39 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Wed, 22 Sep 2021 10:59:00 +0200
> 
>     Eli> AFAICT, the composition machinery doesn't allow to have a rule with
>     Eli> non-zero PREV-CHARS where PREV-CHARS are supposed to match characters
>     Eli> that have their own rules in composition-function-table.  Such rules
>     Eli> are never considered, because once the rules for a character are
>     Eli> processed, we never reconsider what to do with that character.  I will
>     Eli> ask Kenichi Handa whether my conclusion is indeed correct, but for
>     Eli> now, I think we should take the fire escape you mentioned earlier:
> 
> OK. Those are surprising semantics. If it does turn out to be like
> that, we should document it.

We could probably lift this restriction if we change the logic in
composition_compute_stop_pos: it currently returns as soon as it finds
the first character with a valid entry in composition-function-table,
instead of keeping looking until it finds the one whose rule's PATTERN
starts at the smallest character position.  But even if it turns out
there are no problems with such a change, I wouldn't do it for Emacs 28.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 24 Sep 2021 11:42:02 GMT) Full text and rfc822 format available.

Message #293 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>, mfabian <at> redhat.com
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 24 Sep 2021 13:41:07 +0200
>>>>> On Tue, 21 Sep 2021 21:20:07 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> This turns out to be a pretty small change, so if we donʼt get to the
    >> bottom of it we have an alternative.

    Eli> AFAIU, this means we will add 1F3FB..1F3FF to the characters that can
    Eli> follow each of those which have rules with zero lookback.

Yes. Iʼve now pushed exactly that to master. There are two types of
sequences that donʼt work:

1. Where the base character has Emoji_Presentation = No, hence we
donʼt consider it for composition. These are all in the U+2xxx range,
since we explicitly override this for those in the U+1xxxx range. They
do have Emoji_Modifier_Base = Yes, but we donʼt currently do anything
with that info. I guess if we managed to store it in a codepoint
property somewhere, we could teach set-fontset-font or the composition
code about it, but itʼs far too close to emacs-28 for that.

2. Ones I canʼt test because my version of Noto Color Emoji doesnʼt
have glyphs for the base character (essentially these are all the new
14.0 emoji codepoints).

(this does not include the change for choosing emoji presentation for
codepoints followed by VS-16; that still needs some work).

Thanks for the testing and the feedback so far.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 24 Sep 2021 12:05:02 GMT) Full text and rfc822 format available.

Message #296 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 24 Sep 2021 15:04:18 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Fri, 24 Sep 2021 13:41:07 +0200
> 
> Yes. Iʼve now pushed exactly that to master. There are two types of
> sequences that donʼt work:
> 
> 1. Where the base character has Emoji_Presentation = No, hence we
> donʼt consider it for composition. These are all in the U+2xxx range,
> since we explicitly override this for those in the U+1xxxx range. They
> do have Emoji_Modifier_Base = Yes, but we donʼt currently do anything
> with that info. I guess if we managed to store it in a codepoint
> property somewhere, we could teach set-fontset-font or the composition
> code about it, but itʼs far too close to emacs-28 for that.

The idea was to make this work with the patch to font_range on which
you were working?

> (this does not include the change for choosing emoji presentation for
> codepoints followed by VS-16; that still needs some work).

That one is not just for VS-16, right?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 24 Sep 2021 12:11:01 GMT) Full text and rfc822 format available.

Message #299 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 24 Sep 2021 14:10:01 +0200
>>>>> On Fri, 24 Sep 2021 15:04:18 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
    >> Date: Fri, 24 Sep 2021 13:41:07 +0200
    >> 
    >> Yes. Iʼve now pushed exactly that to master. There are two types of
    >> sequences that donʼt work:
    >> 
    >> 1. Where the base character has Emoji_Presentation = No, hence we
    >> donʼt consider it for composition. These are all in the U+2xxx range,
    >> since we explicitly override this for those in the U+1xxxx range. They
    >> do have Emoji_Modifier_Base = Yes, but we donʼt currently do anything
    >> with that info. I guess if we managed to store it in a codepoint
    >> property somewhere, we could teach set-fontset-font or the composition
    >> code about it, but itʼs far too close to emacs-28 for that.

    Eli> The idea was to make this work with the patch to font_range on which
    Eli> you were working?

Yes. I donʼt know yet if that will be enough to allow removing the
overrides.

    >> (this does not include the change for choosing emoji presentation for
    >> codepoints followed by VS-16; that still needs some work).

    Eli> That one is not just for VS-16, right?

Right. Itʼs just that VS-16 make a convenient and colourful test case.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Fri, 24 Sep 2021 19:29:02 GMT) Full text and rfc822 format available.

Message #302 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 24 Sep 2021 21:28:22 +0200
[Message part 1 (text/plain, inline)]
I have now tested with current git master
(last commit 35d0675467e61aff30c21544f87f55b1b1a2cfd3) and it is much
better!

Thank you very much!

I could still find one sequence which doesn‘t work though:

🏴‍☠ U+1F3F4 U+200D U+2620 pirate flag

Works in gedit (i.e. it works in pango/cairo/harfbuzz), see screenshot.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。

[Screenshot.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 25 Sep 2021 05:56:02 GMT) Full text and rfc822 format available.

Message #305 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 25 Sep 2021 08:55:02 +0300
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Fri, 24 Sep 2021 21:28:22 +0200
> 
> I could still find one sequence which doesn‘t work though:
> 
> 🏴‍☠ U+1F3F4 U+200D U+2620 pirate flag
> 
> Works in gedit (i.e. it works in pango/cairo/harfbuzz), see screenshot.

Sounds like a bug in gedit: according to emoji-zwj-sequences.txt, this
is not a complete sequence, the final U+FE0F is missing.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 25 Sep 2021 07:36:02 GMT) Full text and rfc822 format available.

Message #308 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Mike FABIAN <mfabian <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 25 Sep 2021 09:35:03 +0200
Eli Zaretskii <eliz <at> gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian <at> redhat.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
>> Date: Fri, 24 Sep 2021 21:28:22 +0200
>> 
>> I could still find one sequence which doesn‘t work though:
>> 
>> 🏴‍☠ U+1F3F4 U+200D U+2620 pirate flag
>> 
>> Works in gedit (i.e. it works in pango/cairo/harfbuzz), see screenshot.
>
> Sounds like a bug in gedit: according to emoji-zwj-sequences.txt, this
> is not a complete sequence, the final U+FE0F is missing.

Ah, you are right, with the final U+FE0F it works, great!
Then I cannot find any sequence anymore which do not work except those
Robert already mentioned.

Something behaves slightly weird though when stepping with the cursor
over the following emojis:

👩🏽 🏴󠁧󠁢󠁥󠁮󠁧󠁿 🏴󠁧󠁢󠁳󠁣󠁴󠁿 🏴󠁧󠁢󠁷󠁬󠁳󠁿 🏳️‍🌈 🏳️‍⚧️ 🏴‍☠️

For the 3 British flags, 🏴󠁧󠁢󠁥󠁮󠁧󠁿 🏴󠁧󠁢󠁳󠁣󠁴󠁿 🏴󠁧󠁢󠁷󠁬󠁳󠁿, I need 7 times forward-char to step
over them (they are made from 7 code points). But for the other emoji in
that example (👩🏽 🏳️‍🌈 🏳️‍⚧️ 🏴‍☠️) I need only 1 forward-char to step over
each of them, even though they are also made from more than one code
point.

-- 
Mike FABIAN <mfabian <at> redhat.com>
睡眠不足はいい仕事の敵だ。





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 25 Sep 2021 09:21:01 GMT) Full text and rfc822 format available.

Message #311 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mike FABIAN <mfabian <at> redhat.com>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 25 Sep 2021 12:19:52 +0300
> From: Mike FABIAN <mfabian <at> redhat.com>
> Cc: rpluim <at> gmail.com,  rgm <at> gnu.org,  39799 <at> debbugs.gnu.org
> Date: Sat, 25 Sep 2021 09:35:03 +0200
> 
> Something behaves slightly weird though when stepping with the cursor
> over the following emojis:
> 
> 👩🏽 🏴󠁧󠁢󠁥󠁮󠁧󠁿 🏴󠁧󠁢󠁳󠁣󠁴󠁿 🏴󠁧󠁢󠁷󠁬󠁳󠁿 🏳️‍🌈 🏳️‍⚧️ 🏴‍☠️
> 
> For the 3 British flags, 🏴󠁧󠁢󠁥󠁮󠁧󠁿 🏴󠁧󠁢󠁳󠁣󠁴󠁿 🏴󠁧󠁢󠁷󠁬󠁳󠁿, I need 7 times forward-char to step
> over them (they are made from 7 code points). But for the other emoji in
> that example (👩🏽 🏳️‍🌈 🏳️‍⚧️ 🏴‍☠️) I need only 1 forward-char to step over
> each of them, even though they are also made from more than one code
> point.

Thanks, I installed a fix.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39799; Package emacs. (Sat, 06 Nov 2021 19:00:02 GMT) Full text and rfc822 format available.

Message #314 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, rpluim <at> gmail.com, 39799 <at> debbugs.gnu.org,
 Mike FABIAN <mfabian <at> redhat.com>
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Sat, 06 Nov 2021 19:59:26 +0100
This was a very long thread, but skimming it, I think that basically
Robert fixed what was under discussion here, so I'm closing this bug
report.  If there's more to be done here, a new report that addresses
those things in specific should be opened.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





bug marked as fixed in version 28.1, send any further explanations to 39799 <at> debbugs.gnu.org and Mike FABIAN <mfabian <at> redhat.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sat, 06 Nov 2021 19:00:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 05 Dec 2021 12:24:15 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 137 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.