GNU bug report logs - #63731
[PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate

Previous Next

Package: emacs;

Reported by: Steven Allen <steven <at> stebalien.com>

Date: Fri, 26 May 2023 03:19:01 UTC

Severity: normal

Tags: fixed, patch

Fixed in version 29.1

Done: Robert Pluim <rpluim <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63731 in the body.
You can then email your comments to 63731 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 03:19:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Steven Allen <steven <at> stebalien.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 26 May 2023 03:19:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Steven Allen <steven <at> stebalien.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate
Date: Thu, 25 May 2023 20:18:02 -0700
[Message part 1 (text/plain, inline)]
This patch imports the full list from unicode.org instead of
special-casing a few characters as was done previously.

With this patch, '👍️' (1F44D FE0F) should look the same as '👍' (1F44D).
Without it, it will look like '👍‌️'.

As a simple regression test, '✔' (2714) should still as "text" while '✔️'
(2714 FE0F) should still display as an emoji.

Fixes https://github.com/alphapapa/ement.el/issues/137

NOTE: I'm not a Unicode expert, nor do I understand how Emacs handles
Unicode (beyond what was required to implement this patch). But this
patch appears to work and I can't find any regressions.

[0001-Support-Emoji-Variation-Sequence-16-FE0F-where-appro.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 06:42:02 GMT) Full text and rfc822 format available.

Message #8 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steven Allen <steven <at> stebalien.com>, Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 09:41:42 +0300
> From: Steven Allen <steven <at> stebalien.com>
> Date: Thu, 25 May 2023 20:18:02 -0700
> 
> This patch imports the full list from unicode.org instead of
> special-casing a few characters as was done previously.
> 
> With this patch, '👍️' (1F44D FE0F) should look the same as '👍' (1F44D).
> Without it, it will look like '👍‌️'.
> 
> As a simple regression test, '✔' (2714) should still as "text" while '✔️'
> (2714 FE0F) should still display as an emoji.
> 
> Fixes https://github.com/alphapapa/ement.el/issues/137
> 
> NOTE: I'm not a Unicode expert, nor do I understand how Emacs handles
> Unicode (beyond what was required to implement this patch). But this
> patch appears to work and I can't find any regressions.

AFAIU, this change will populate composition-function-table for many
"normal" characters, including ASCII digits and symbol/punctuation
characters from the 0x2xxx blocks.  E.g., after you build Emacs with
this patch, what do the following evaluations yield:

  M-: (aref composition-function-table ?0) RET
  M-: (aref composition-function-table #x2122) RET

If they yield non-nil values, it could mean dramatic slowdown of
redisplay with these characters.  Which is precisely what we wanted to
avoid when we made the decision which parts of the Unicode-defined
Emoji sequences to support in Emacs, and how to arrange for that
support to work.

The issue you site is strange: according to the "C-u C-x =" display
there, Emacs did compose #x1f44d with VS-16 using the Noto Color Emoji
font, so I don't quite understand why VS-16 is then also shown as an
empty rectangle.  On my system Noto Color Emoji doesn't work, and "C-u
C-x =" says this instead:

  Composed with the following character(s) "️" using this font:
    harfbuzz:-outline-Noto Emoji-regular-normal-normal-mono-15-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 128077 422 19 2 17 14 2 nil]
    [0 1 65039 3 19 0 1 0 1 [0 0 0]]
  with these character(s):
    ️ (#xfe0f) VARIATION SELECTOR-16

which explains why I see two glyphs and not 1.  But in the display
shown in the above issue, I see

  Composed with the following character(s) "️" using this font:
    ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-18-*-*-*-m-0-iso10646-1
  by these glyphs:
    [0 1 128077 569 22 0 23 17 5 [0 0 136]]
  with these character(s):
    ️ (#xfe0f) VARIATION SELECTOR-16

which describes only one glyph, not two.  So the result ought to be
what you expect.

Robert, what am I missing here?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 08:35:02 GMT) Full text and rfc822 format available.

Message #11 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, Steven Allen <steven <at> stebalien.com>
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 10:34:02 +0200
Disclaimer: I havenʼt looked at the patch yet

>>>>> On Fri, 26 May 2023 09:41:42 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Steven Allen <steven <at> stebalien.com>
    >> Date: Thu, 25 May 2023 20:18:02 -0700
    >> 
    >> This patch imports the full list from unicode.org instead of
    >> special-casing a few characters as was done previously.
    >> 
    >> With this patch, '👍️' (1F44D FE0F) should look the same as '👍' (1F44D).
    >> Without it, it will look like '👍‌️'.
    >> 
    >> As a simple regression test, '✔' (2714) should still as "text" while '✔️'
    >> (2714 FE0F) should still display as an emoji.
    >> 
    >> Fixes https://github.com/alphapapa/ement.el/issues/137
    >> 
    >> NOTE: I'm not a Unicode expert, nor do I understand how Emacs handles
    >> Unicode (beyond what was required to implement this patch). But this
    >> patch appears to work and I can't find any regressions.

    Eli> AFAIU, this change will populate composition-function-table for many
    Eli> "normal" characters, including ASCII digits and symbol/punctuation
    Eli> characters from the 0x2xxx blocks.  E.g., after you build Emacs with
    Eli> this patch, what do the following evaluations yield:

    Eli>   M-: (aref composition-function-table ?0) RET
    Eli>   M-: (aref composition-function-table #x2122) RET

    Eli> If they yield non-nil values, it could mean dramatic slowdown of
    Eli> redisplay with these characters.  Which is precisely what we wanted to
    Eli> avoid when we made the decision which parts of the Unicode-defined
    Eli> Emoji sequences to support in Emacs, and how to arrange for that
    Eli> support to work.

Yes. We donʼt want to do composition checks for ASCII if we can avoid it.

    Eli> The issue you site is strange: according to the "C-u C-x =" display
    Eli> there, Emacs did compose #x1f44d with VS-16 using the Noto Color Emoji
    Eli> font, so I don't quite understand why VS-16 is then also shown as an
    Eli> empty rectangle.  On my system Noto Color Emoji doesn't work, and "C-u
    Eli> C-x =" says this instead:

    Eli>   Composed with the following character(s) "️" using this font:
    Eli>     harfbuzz:-outline-Noto Emoji-regular-normal-normal-mono-15-*-*-*-c-*-iso10646-1
    Eli>   by these glyphs:
    Eli>     [0 1 128077 422 19 2 17 14 2 nil]
    Eli>     [0 1 65039 3 19 0 1 0 1 [0 0 0]]
    Eli>   with these character(s):
    Eli>     ️ (#xfe0f) VARIATION SELECTOR-16

    Eli> which explains why I see two glyphs and not 1.  But in the display
    Eli> shown in the above issue, I see

    Eli>   Composed with the following character(s) "️" using this font:
    Eli>     ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-18-*-*-*-m-0-iso10646-1
    Eli>   by these glyphs:
    Eli>     [0 1 128077 569 22 0 23 17 5 [0 0 136]]
    Eli>   with these character(s):
    Eli>     ️ (#xfe0f) VARIATION SELECTOR-16

    Eli> which describes only one glyph, not two.  So the result ought to be
    Eli> what you expect.

I see the emoji followed by a blank box with Noto Color Emoji here. I
donʼt yet understand why.

    Eli> Robert, what am I missing here?

1F44D FE0F is a valid sequence according to tr51

(aref composition-function-table #x1f44d)
=> (["\\(?:👍[🏻-🏿]\\)" 0 compose-gstring-for-graphic])

which means that the composition is being triggered by this entry:

(aref composition-function-table #xfe0f)
=> (["\\c.\\c^+" 1 compose-gstring-for-graphic] [nil 0 compose-gstring-for-graphic])

(time passes)

Ugh. The following fixes it for me:

diff --git a/lisp/composite.el b/lisp/composite.el
index fb8b76114f4..af86d1436d3 100644
--- a/lisp/composite.el
+++ b/lisp/composite.el
@@ -756,7 +756,7 @@ compose-gstring-for-dotted-circle
 ;; Allow for bootstrapping without uni-*.el.
 (when unicode-category-table
   (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
-	       [nil 0 compose-gstring-for-graphic])))
+	       )))
     (map-char-table
      #'(lambda (key val)
 	 (if (memq val '(Mn Mc Me))

Although the following is less invasive:

diff --git a/lisp/composite.el b/lisp/composite.el
index fb8b76114f4..333428f008a 100644
--- a/lisp/composite.el
+++ b/lisp/composite.el
@@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle
 	 (if (memq val '(Mn Mc Me))
 	     (set-char-table-range composition-function-table key elt)))
      unicode-category-table))
+  ;; for Emoji presentation selector
+  (set-char-table-range
+   composition-function-table
+   #xFE0F
+    `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))
   ;; for dotted-circle
   (aset composition-function-table #x25CC
 	`([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))

Didnʼt we conclude that composition had some issues with multiple
entries for the same codepoint if there was a mix for forward and
backward looking regexp?

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 08:46:01 GMT) Full text and rfc822 format available.

Message #14 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 11:46:05 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Steven Allen <steven <at> stebalien.com>,  63731 <at> debbugs.gnu.org
> Date: Fri, 26 May 2023 10:34:02 +0200
> 
> Ugh. The following fixes it for me:
> 
> diff --git a/lisp/composite.el b/lisp/composite.el
> index fb8b76114f4..af86d1436d3 100644
> --- a/lisp/composite.el
> +++ b/lisp/composite.el
> @@ -756,7 +756,7 @@ compose-gstring-for-dotted-circle
>  ;; Allow for bootstrapping without uni-*.el.
>  (when unicode-category-table
>    (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
> -	       [nil 0 compose-gstring-for-graphic])))
> +	       )))

This is unacceptable, AFAIU.  We cannot unsupported (or change) the
correct display of mark characters, can we?

> Although the following is less invasive:
> 
> diff --git a/lisp/composite.el b/lisp/composite.el
> index fb8b76114f4..333428f008a 100644
> --- a/lisp/composite.el
> +++ b/lisp/composite.el
> @@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle
>  	 (if (memq val '(Mn Mc Me))
>  	     (set-char-table-range composition-function-table key elt)))
>       unicode-category-table))
> +  ;; for Emoji presentation selector
> +  (set-char-table-range
> +   composition-function-table
> +   #xFE0F
> +    `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))
>    ;; for dotted-circle
>    (aset composition-function-table #x25CC
>  	`([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))

Can you please explain why the current setup doesn't work in this
case, even though "C-u C-x =" says the composition was done?  And how
the above patch fixes that?

> Didnʼt we conclude that composition had some issues with multiple
> entries for the same codepoint if there was a mix for forward and
> backward looking regexp?

Not sure I understand to what does this allude.  What mix of forward
and backward looking regexp do you see?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 11:15:01 GMT) Full text and rfc822 format available.

Message #17 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 13:14:27 +0200
>>>>> On Fri, 26 May 2023 11:46:05 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Steven Allen <steven <at> stebalien.com>,  63731 <at> debbugs.gnu.org
    >> Date: Fri, 26 May 2023 10:34:02 +0200
    >> 
    >> Ugh. The following fixes it for me:
    >> 
    >> diff --git a/lisp/composite.el b/lisp/composite.el
    >> index fb8b76114f4..af86d1436d3 100644
    >> --- a/lisp/composite.el
    >> +++ b/lisp/composite.el
    >> @@ -756,7 +756,7 @@ compose-gstring-for-dotted-circle
    >> ;; Allow for bootstrapping without uni-*.el.
    >> (when unicode-category-table
    >> (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
    >> -	       [nil 0 compose-gstring-for-graphic])))
    >> +	       )))

    Eli> This is unacceptable, AFAIU.  We cannot unsupported (or change) the
    Eli> correct display of mark characters, can we?

Right. Iʼll hold off pushing it 😃

    >> Although the following is less invasive:
    >> 
    >> diff --git a/lisp/composite.el b/lisp/composite.el
    >> index fb8b76114f4..333428f008a 100644
    >> --- a/lisp/composite.el
    >> +++ b/lisp/composite.el
    >> @@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle
    >> (if (memq val '(Mn Mc Me))
    >> (set-char-table-range composition-function-table key elt)))
    >> unicode-category-table))
    >> +  ;; for Emoji presentation selector
    >> +  (set-char-table-range
    >> +   composition-function-table
    >> +   #xFE0F
    >> +    `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))
    >> ;; for dotted-circle
    >> (aset composition-function-table #x25CC
    >> `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))

    Eli> Can you please explain why the current setup doesn't work in this
    Eli> case, even though "C-u C-x =" says the composition was done?  And how
    Eli> the above patch fixes that?

Composition is done for 1f44d+fe0f, but I suspect that with the current
setup, composition is called again for FE0F, which results in the box
glyph. With the second patch we will only do backwards looking composition
for FE0F

    >> Didnʼt we conclude that composition had some issues with multiple
    >> entries for the same codepoint if there was a mix for forward and
    >> backward looking regexp?

    Eli> Not sure I understand to what does this allude.  What mix of forward
    Eli> and backward looking regexp do you see?

Youʼre right, thereʼs no forward looking regexp, only a backwards one
and a no-regexp. But itʼs undeniable that:

 [nil 0 compose-gstring-for-graphic]

causes the issue. Iʼve never been clear on the semantics of that.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 12:07:01 GMT) Full text and rfc822 format available.

Message #20 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 15:06:40 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: steven <at> stebalien.com,  63731 <at> debbugs.gnu.org
> Date: Fri, 26 May 2023 13:14:27 +0200
> 
>     >> Although the following is less invasive:
>     >> 
>     >> diff --git a/lisp/composite.el b/lisp/composite.el
>     >> index fb8b76114f4..333428f008a 100644
>     >> --- a/lisp/composite.el
>     >> +++ b/lisp/composite.el
>     >> @@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle
>     >> (if (memq val '(Mn Mc Me))
>     >> (set-char-table-range composition-function-table key elt)))
>     >> unicode-category-table))
>     >> +  ;; for Emoji presentation selector
>     >> +  (set-char-table-range
>     >> +   composition-function-table
>     >> +   #xFE0F
>     >> +    `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))
>     >> ;; for dotted-circle
>     >> (aset composition-function-table #x25CC
>     >> `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))
> 
>     Eli> Can you please explain why the current setup doesn't work in this
>     Eli> case, even though "C-u C-x =" says the composition was done?  And how
>     Eli> the above patch fixes that?
> 
> Composition is done for 1f44d+fe0f, but I suspect that with the current
> setup, composition is called again for FE0F, which results in the box
> glyph. With the second patch we will only do backwards looking composition
> for FE0F

OK, then I think we should install this on the emacs-29 branch.

> Youʼre right, thereʼs no forward looking regexp, only a backwards one
> and a no-regexp. But itʼs undeniable that:
> 
>  [nil 0 compose-gstring-for-graphic]
> 
> causes the issue. Iʼve never been clear on the semantics of that.

It has special support in compose-gstring-for-graphic, see there.  The
doc string also says a few words about that.  We use this, e.g., in
describe-char display, where we sometimes need to show a single
combining character with no base character to combine it with.  I
think this is only relevant for accents and other such combining
characters, not for VS-n.

What does this issue mean for the other VS-n characters, though?
Should we perhaps install something similar for them as well?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 14:03:02 GMT) Full text and rfc822 format available.

Message #23 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 16:02:40 +0200
>>>>> On Fri, 26 May 2023 15:06:40 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> 
    >> Composition is done for 1f44d+fe0f, but I suspect that with the current
    >> setup, composition is called again for FE0F, which results in the box
    >> glyph. With the second patch we will only do backwards looking composition
    >> for FE0F

    Eli> OK, then I think we should install this on the emacs-29 branch.

    >> Youʼre right, thereʼs no forward looking regexp, only a backwards one
    >> and a no-regexp. But itʼs undeniable that:
    >> 
    >> [nil 0 compose-gstring-for-graphic]
    >> 
    >> causes the issue. Iʼve never been clear on the semantics of that.

    Eli> It has special support in compose-gstring-for-graphic, see there.  The
    Eli> doc string also says a few words about that.  We use this, e.g., in
    Eli> describe-char display, where we sometimes need to show a single
    Eli> combining character with no base character to combine it with.  I
    Eli> think this is only relevant for accents and other such combining
    Eli> characters, not for VS-n.

OK

    Eli> What does this issue mean for the other VS-n characters, though?
    Eli> Should we perhaps install something similar for them as well?

For VS-15 maybe? The following gets me text-presentation composition
with CHAR+FE0E and emoji-presentation with CHAR+FE0F

diff --git a/lisp/composite.el b/lisp/composite.el
index fb8b76114f4..ada35010146 100644
--- a/lisp/composite.el
+++ b/lisp/composite.el
@@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle
 	 (if (memq val '(Mn Mc Me))
 	     (set-char-table-range composition-function-table key elt)))
      unicode-category-table))
+  ;; for Emoji presentation selector
+  (set-char-table-range
+   composition-function-table
+   '(#xFE0E . #xFE0F)
+    `([,(purecopy "\\c.[\ufe0f\ufe0e]") 1 compose-gstring-for-graphic]))
   ;; for dotted-circle
   (aset composition-function-table #x25CC
 	`([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))
@@ -861,7 +866,7 @@ compose-gstring-for-variation-glyph
 ;; handled in font_range, we end up choosing the Emoji presentation
 ;; rather than the Text presentation.
 (let ((elt '([".." 1 compose-gstring-for-variation-glyph])))
-  (set-char-table-range composition-function-table '(#xFE00 . #xFE0E) elt)
+  (set-char-table-range composition-function-table '(#xFE00 . #xFE0D) elt)
   (set-char-table-range composition-function-table '(#xE0100 . #xE01EF) elt))
 
 (defun auto-compose-chars (func from to font-object string direction)

although perhaps we could have both `compose-gstring-for-graphic' and
`compose-gstring-for-variation-glyph' for FE0E

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 14:56:01 GMT) Full text and rfc822 format available.

Message #26 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 17:55:26 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: steven <at> stebalien.com,  63731 <at> debbugs.gnu.org
> Date: Fri, 26 May 2023 16:02:40 +0200
> 
>     Eli> What does this issue mean for the other VS-n characters, though?
>     Eli> Should we perhaps install something similar for them as well?
> 
> For VS-15 maybe? The following gets me text-presentation composition
> with CHAR+FE0E and emoji-presentation with CHAR+FE0F

Actually, I forgot about compose-gstring-for-variation-glyph.  My
question was actually whether the general setting in

  (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
	       [nil 0 compose-gstring-for-graphic])))
    (map-char-table
     #'(lambda (key val)
	 (if (memq val '(Mn Mc Me))
	     (set-char-table-range composition-function-table key elt)))
     unicode-category-table))

affects also the VS-n selectors.  But since the latter setting of

  (let ((elt '([".." 1 compose-gstring-for-variation-glyph])))
    (set-char-table-range composition-function-table '(#xFE00 . #xFE0E) elt)
    (set-char-table-range composition-function-table '(#xE0100 . #xE01EF) elt))

takes care of all the VS-n selectors except VS-16, and your patch now
will take care of VS-16, it sounds like we don't need to care about
other VS-n selectors?

Or are you saying that without including VS-15, CHAR+FE0E is not
displayed using its text representation?

Did you test the proposed change with the admin/emoji-*.txt files, to
make sure they all still display OK?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 15:07:01 GMT) Full text and rfc822 format available.

Message #29 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Steven Allen <steven <at> stebalien.com>
To: Eli Zaretskii <eliz <at> gnu.org>, Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 08:06:11 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:
> AFAIU, this change will populate composition-function-table for many
> "normal" characters, including ASCII digits and symbol/punctuation
> characters from the 0x2xxx blocks.  E.g., after you build Emacs with
> this patch, what do the following evaluations yield:
>
>   M-: (aref composition-function-table ?0) RET
>   M-: (aref composition-function-table #x2122) RET
>
> If they yield non-nil values, it could mean dramatic slowdown of
> redisplay with these characters.

Both of these yield nil with this patch applied (and I haven't noticed
any performance regressions). But it looks like you and Robert have a
better patch so I'll leave you to it.

However, I'd like to draw your attention to the existing hard-coded
VS-16 table here:

https://git.savannah.gnu.org/cgit/emacs.git/tree/admin/unidata/emoji-zwj.awk?h=4b3de748b0b04407d2492500c77905de56de1180#n72

It feels like this should either be the full table (the one in the
patch) or it shouldn't exist at all. But again, I'm not the expert here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 15:26:01 GMT) Full text and rfc822 format available.

Message #32 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 17:25:24 +0200
>>>>> On Fri, 26 May 2023 17:55:26 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: steven <at> stebalien.com,  63731 <at> debbugs.gnu.org
    >> Date: Fri, 26 May 2023 16:02:40 +0200
    >> 
    Eli> What does this issue mean for the other VS-n characters, though?
    Eli> Should we perhaps install something similar for them as well?
    >> 
    >> For VS-15 maybe? The following gets me text-presentation composition
    >> with CHAR+FE0E and emoji-presentation with CHAR+FE0F

    Eli> Actually, I forgot about compose-gstring-for-variation-glyph.  My
    Eli> question was actually whether the general setting in

    Eli>   (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
    Eli> 	       [nil 0 compose-gstring-for-graphic])))
    Eli>     (map-char-table
    Eli>      #'(lambda (key val)
    Eli> 	 (if (memq val '(Mn Mc Me))
    Eli> 	     (set-char-table-range composition-function-table key elt)))
    Eli>      unicode-category-table))

    Eli> affects also the VS-n selectors.  But since the latter setting of

    Eli>   (let ((elt '([".." 1 compose-gstring-for-variation-glyph])))
    Eli>     (set-char-table-range composition-function-table '(#xFE00 . #xFE0E) elt)
    Eli>     (set-char-table-range composition-function-table '(#xE0100 . #xE01EF) elt))

    Eli> takes care of all the VS-n selectors except VS-16, and your patch now
    Eli> will take care of VS-16, it sounds like we don't need to care about
    Eli> other VS-n selectors?

    Eli> Or are you saying that without including VS-15, CHAR+FE0E is not
    Eli> displayed using its text representation?

Not quite. If I donʼt have compose-gstring-for-graphic for VS-15, no
composition occurs for CHAR+FE0E. With my change youʼll get
composition, but itʼs still not 100% correct: CHAR+FE0E when CHAR is a
member of the emoji script will use emoji presentation, not text, but
the extra empty box will not show, so itʼs still an improvement.

    Eli> Did you test the proposed change with the admin/emoji-*.txt files, to
    Eli> make sure they all still display OK?

Yes. Iʼve also got a change that makes Emoji_Keycap_Sequence work, but
I think we can leave that for master.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 15:30:02 GMT) Full text and rfc822 format available.

Message #35 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Steven Allen <steven <at> stebalien.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 63731 <at> debbugs.gnu.org
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 17:29:41 +0200
>>>>> On Fri, 26 May 2023 08:06:11 -0700, Steven Allen <steven <at> stebalien.com> said:

    Steven> Eli Zaretskii <eliz <at> gnu.org> writes:
    >> AFAIU, this change will populate composition-function-table for many
    >> "normal" characters, including ASCII digits and symbol/punctuation
    >> characters from the 0x2xxx blocks.  E.g., after you build Emacs with
    >> this patch, what do the following evaluations yield:
    >> 
    >> M-: (aref composition-function-table ?0) RET
    >> M-: (aref composition-function-table #x2122) RET
    >> 
    >> If they yield non-nil values, it could mean dramatic slowdown of
    >> redisplay with these characters.

    Steven> Both of these yield nil with this patch applied (and I haven't noticed
    Steven> any performance regressions). But it looks like you and Robert have a
    Steven> better patch so I'll leave you to it.

Itʼs smaller, thatʼs for sure. And it will definitely be faster.

    Steven> However, I'd like to draw your attention to the existing hard-coded
    Steven> VS-16 table here:

    Steven> https://git.savannah.gnu.org/cgit/emacs.git/tree/admin/unidata/emoji-zwj.awk?h=4b3de748b0b04407d2492500c77905de56de1180#n72

    Steven> It feels like this should either be the full table (the one in the
    Steven> patch) or it shouldn't exist at all. But again, I'm not the expert here.

Welcome to the wonderful world of Unicode. The reason the table exists
is that there are codepoints that are *not* emoji, but theyʼre part of
emoji sequences, so we still need to treat them as emoji in some
situations. Why Unicode didnʼt just make them emoji I donʼt know.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 15:52:02 GMT) Full text and rfc822 format available.

Message #38 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 18:52:22 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: steven <at> stebalien.com,  63731 <at> debbugs.gnu.org
> Date: Fri, 26 May 2023 17:25:24 +0200
> 
>     Eli> Or are you saying that without including VS-15, CHAR+FE0E is not
>     Eli> displayed using its text representation?
> 
> Not quite. If I donʼt have compose-gstring-for-graphic for VS-15, no
> composition occurs for CHAR+FE0E. With my change youʼll get
> composition, but itʼs still not 100% correct: CHAR+FE0E when CHAR is a
> member of the emoji script will use emoji presentation, not text, but
> the extra empty box will not show, so itʼs still an improvement.

OK.  And what about CHAR+FE0E when CHAR is not an Emoji?

Anyway, I think you should install the patch on emacs-29, and we
should then try to fix the text-representation bug with VS-15 on
master.  (I guess it requires a change to font.c or something?)

>     Eli> Did you test the proposed change with the admin/emoji-*.txt files, to
>     Eli> make sure they all still display OK?
> 
> Yes. Iʼve also got a change that makes Emoji_Keycap_Sequence work, but
> I think we can leave that for master.

Depends on the solution, I guess.  Isn't it just a change to the
VS-16's entry in composition-function-table?  Or maybe a change in the
#x20e3's entry?  (Did we discus the Emoji_Keycap_Sequence case before?)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 16:04:02 GMT) Full text and rfc822 format available.

Message #41 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Steven Allen <steven <at> stebalien.com>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 63731 <at> debbugs.gnu.org
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 09:03:28 -0700
Robert Pluim <rpluim <at> gmail.com> writes:
> Welcome to the wonderful world of Unicode. The reason the table exists
> is that there are codepoints that are *not* emoji, but theyʼre part of
> emoji sequences, so we still need to treat them as emoji in some
> situations. Why Unicode didnʼt just make them emoji I donʼt know.

Got it... It sounds like the "correct" solution is to download the full
list (emoji-variation-sequences.txt) and filter for non-emoji
characters, but I guess that's overkill.

Thanks!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 16:25:02 GMT) Full text and rfc822 format available.

Message #44 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 18:24:02 +0200
>>>>> On Fri, 26 May 2023 18:52:22 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: steven <at> stebalien.com,  63731 <at> debbugs.gnu.org
    >> Date: Fri, 26 May 2023 17:25:24 +0200
    >> 
    Eli> Or are you saying that without including VS-15, CHAR+FE0E is not
    Eli> displayed using its text representation?
    >> 
    >> Not quite. If I donʼt have compose-gstring-for-graphic for VS-15, no
    >> composition occurs for CHAR+FE0E. With my change youʼll get
    >> composition, but itʼs still not 100% correct: CHAR+FE0E when CHAR is a
    >> member of the emoji script will use emoji presentation, not text, but
    >> the extra empty box will not show, so itʼs still an improvement.

    Eli> OK.  And what about CHAR+FE0E when CHAR is not an Emoji?

Then you get the (composed) text presentation (and the composed emoji
presentation when itʼs CHAR+FE0F).

    Eli> Anyway, I think you should install the patch on emacs-29, and we
    Eli> should then try to fix the text-representation bug with VS-15 on
    Eli> master.  (I guess it requires a change to font.c or something?)

It requires something that answers the question "what font would we
use for this codepoint if it was not an emoji?". Maybe we can have a
separate fontset that pretends that the emoji script is equivalent to
symbol? Or invent some kind of 'text-presentation-font' property to
put somewhere?

    Eli> Did you test the proposed change with the admin/emoji-*.txt files, to
    Eli> make sure they all still display OK?
    >> 
    >> Yes. Iʼve also got a change that makes Emoji_Keycap_Sequence work, but
    >> I think we can leave that for master.

    Eli> Depends on the solution, I guess.  Isn't it just a change to the
    Eli> VS-16's entry in composition-function-table?  Or maybe a change in the
    Eli> #x20e3's entry?  (Did we discus the Emoji_Keycap_Sequence case before?)

Itʼs a change to the VS-16 entry. We did discuss it before, and
decided to put it aside because the solutions all involved adding
composition-function-table entries for 0-9 or similar. I donʼt
remember why we didnʼt consider adding to VS-16ʼs entry.

Iʼll do some more testing, and post a final version hopefully this
weekend sometime.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 17:28:02 GMT) Full text and rfc822 format available.

Message #47 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 20:27:26 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Fri, 26 May 2023 18:24:02 +0200
> 
> >>>>> On Fri, 26 May 2023 18:52:22 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> Anyway, I think you should install the patch on emacs-29, and we
>     Eli> should then try to fix the text-representation bug with VS-15 on
>     Eli> master.  (I guess it requires a change to font.c or something?)
> 
> It requires something that answers the question "what font would we
> use for this codepoint if it was not an emoji?". Maybe we can have a
> separate fontset that pretends that the emoji script is equivalent to
> symbol? Or invent some kind of 'text-presentation-font' property to
> put somewhere?

I'm not sure I understand why we don't select the right font by
default.  Selecting a non-Emoji font for a non-Emoji codepoints should
not need any special tricks.

>     >> Yes. Iʼve also got a change that makes Emoji_Keycap_Sequence work, but
>     >> I think we can leave that for master.
> 
>     Eli> Depends on the solution, I guess.  Isn't it just a change to the
>     Eli> VS-16's entry in composition-function-table?  Or maybe a change in the
>     Eli> #x20e3's entry?  (Did we discus the Emoji_Keycap_Sequence case before?)
> 
> Itʼs a change to the VS-16 entry. We did discuss it before, and
> decided to put it aside because the solutions all involved adding
> composition-function-table entries for 0-9 or similar. I donʼt
> remember why we didnʼt consider adding to VS-16ʼs entry.
> 
> Iʼll do some more testing, and post a final version hopefully this
> weekend sometime.

OK, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 17:37:01 GMT) Full text and rfc822 format available.

Message #50 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 19:35:56 +0200
>>>>> On Fri, 26 May 2023 20:27:26 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Fri, 26 May 2023 18:24:02 +0200
    >> 
    >> >>>>> On Fri, 26 May 2023 18:52:22 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    Eli> Anyway, I think you should install the patch on emacs-29, and we
    Eli> should then try to fix the text-representation bug with VS-15 on
    Eli> master.  (I guess it requires a change to font.c or something?)
    >> 
    >> It requires something that answers the question "what font would we
    >> use for this codepoint if it was not an emoji?". Maybe we can have a
    >> separate fontset that pretends that the emoji script is equivalent to
    >> symbol? Or invent some kind of 'text-presentation-font' property to
    >> put somewhere?

    Eli> I'm not sure I understand why we don't select the right font by
    Eli> default.  Selecting a non-Emoji font for a non-Emoji codepoints should
    Eli> not need any special tricks.

It doesnʼt but in this case it *is* an emoji codepoint, so it displays
as emoji because of font.c, even when followed by VS-15.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 17:44:01 GMT) Full text and rfc822 format available.

Message #53 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rpluim <at> gmail.com, 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 20:43:37 +0300
> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
> Date: Fri, 26 May 2023 20:27:26 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > From: Robert Pluim <rpluim <at> gmail.com>
> > Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> > Date: Fri, 26 May 2023 18:24:02 +0200
> > 
> > It requires something that answers the question "what font would we
> > use for this codepoint if it was not an emoji?". Maybe we can have a
> > separate fontset that pretends that the emoji script is equivalent to
> > symbol? Or invent some kind of 'text-presentation-font' property to
> > put somewhere?
> 
> I'm not sure I understand why we don't select the right font by
> default.  Selecting a non-Emoji font for a non-Emoji codepoints should
> not need any special tricks.

Actually, I don't understand why there's an issue here with font
selection.  Are you saying that using Noto Color Emoji with
CHAR+0xFE0E, when CHAR is an Emoji character, doesn't produce the
textual representation of CHAR?  If so, isn't that a problem with the
font?  I thought all we needed to do was to hand the combination to an
Emoji-aware font, and the font would do the rest.  Now you seem to be
saying that we somehow need to select a non-Emoji font?  But if so,
who'd guarantee that a font that cannot display Emoji will know what
to do with the combination CHAR+0xFE0E?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 26 May 2023 18:06:02 GMT) Full text and rfc822 format available.

Message #56 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 26 May 2023 21:05:47 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Fri, 26 May 2023 19:35:56 +0200
> 
> in this case it *is* an emoji codepoint, so it displays
> as emoji because of font.c, even when followed by VS-15.

If we pass to an Emoji-capable font a sequence of a character followed
by VS-15, I'd expect the font to produce a glyph with the textual
representation of that character.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sun, 28 May 2023 10:30:03 GMT) Full text and rfc822 format available.

Message #59 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sun, 28 May 2023 12:29:48 +0200
>>>>> On Fri, 26 May 2023 20:43:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
    >> Date: Fri, 26 May 2023 20:27:26 +0300
    >> From: Eli Zaretskii <eliz <at> gnu.org>
    >> 
    >> > From: Robert Pluim <rpluim <at> gmail.com>
    >> > Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> > Date: Fri, 26 May 2023 18:24:02 +0200
    >> > 
    >> > It requires something that answers the question "what font would we
    >> > use for this codepoint if it was not an emoji?". Maybe we can have a
    >> > separate fontset that pretends that the emoji script is equivalent to
    >> > symbol? Or invent some kind of 'text-presentation-font' property to
    >> > put somewhere?
    >> 
    >> I'm not sure I understand why we don't select the right font by
    >> default.  Selecting a non-Emoji font for a non-Emoji codepoints should
    >> not need any special tricks.

    Eli> Actually, I don't understand why there's an issue here with font
    Eli> selection.  Are you saying that using Noto Color Emoji with
    Eli> CHAR+0xFE0E, when CHAR is an Emoji character, doesn't produce the
    Eli> textual representation of CHAR?  If so, isn't that a problem with the
    Eli> font?  I thought all we needed to do was to hand the combination to an
    Eli> Emoji-aware font, and the font would do the rest.  Now you seem to be
    Eli> saying that we somehow need to select a non-Emoji font?  But if so,
    Eli> who'd guarantee that a font that cannot display Emoji will know what
    Eli> to do with the combination CHAR+0xFE0E?

Iʼm not sure: gedit displays the text representation, and libreoffice
displays the emoji presentation. And the google color emoji website
only shows colour glyphs. So I think itʼs up to the application to
select the correct font.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sun, 28 May 2023 11:44:01 GMT) Full text and rfc822 format available.

Message #62 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sun, 28 May 2023 13:43:13 +0200
>>>>> On Fri, 26 May 2023 21:05:47 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Fri, 26 May 2023 19:35:56 +0200
    >> 
    >> in this case it *is* an emoji codepoint, so it displays
    >> as emoji because of font.c, even when followed by VS-15.

    Eli> If we pass to an Emoji-capable font a sequence of a character followed
    Eli> by VS-15, I'd expect the font to produce a glyph with the textual
    Eli> representation of that character.

But we donʼt do that: we ask the font "give me a glyph for this codepoint".

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sun, 28 May 2023 11:58:02 GMT) Full text and rfc822 format available.

Message #65 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sun, 28 May 2023 13:57:49 +0200
>>>>> On Fri, 26 May 2023 20:27:26 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    >> Itʼs a change to the VS-16 entry. We did discuss it before, and
    >> decided to put it aside because the solutions all involved adding
    >> composition-function-table entries for 0-9 or similar. I donʼt
    >> remember why we didnʼt consider adding to VS-16ʼs entry.
    >> 
    >> Iʼll do some more testing, and post a final version hopefully this
    >> weekend sometime.

    Eli> OK, thanks.

Eli, if the 20e3 changes are too much for emacs-29, I can put them in
master.

Iʼll put some notes in admin/notes/unicode as well.

diff --git c/admin/unidata/emoji-zwj.awk i/admin/unidata/emoji-zwj.awk
index 7d2ff6cb900..0b6f1267205 100644
--- c/admin/unidata/emoji-zwj.awk
+++ i/admin/unidata/emoji-zwj.awk
@@ -82,6 +82,7 @@ END {
      trigger_codepoints[11] = "1F574"
      trigger_codepoints[12] = "1F575"
      trigger_codepoints[13] = "1F590"
+     trigger_codepoints[14] = "20E3"
 
      printf "(setq auto-composition-emoji-eligible-codepoints\n"
      printf "'("
diff --git c/lisp/composite.el i/lisp/composite.el
index fb8b76114f4..acba4e73c17 100644
--- c/lisp/composite.el
+++ i/lisp/composite.el
@@ -762,6 +762,23 @@ compose-gstring-for-dotted-circle
 	 (if (memq val '(Mn Mc Me))
 	     (set-char-table-range composition-function-table key elt)))
      unicode-category-table))
+  ;; for Emoji presentation selector
+  ;; We don't want the generic nil 0 entry because it causes display
+  ;; of an extra box for FE0F.  (Bug#63731)
+  ;; This also covers the fully-qualified enclosing keycap case.
+  (set-char-table-range
+   composition-function-table
+   #xFE0E
+   `([,(purecopy "\\c.\ufe0e") 1 compose-gstring-for-graphic]))
+  (set-char-table-range
+   composition-function-table
+   #xFE0F
+   `([,(purecopy "\\c.\ufe0f\u20e3?") 1 compose-gstring-for-graphic]))
+  ;; for unqualified enclosing keycap
+  (set-char-table-range
+   composition-function-table
+   #x20E3
+   `([,(purecopy "[#*0-9]\u20e3") 1 compose-gstring-for-graphic]))
   ;; for dotted-circle
   (aset composition-function-table #x25CC
 	`([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))
@@ -857,11 +874,10 @@ compose-gstring-for-variation-glyph
 ;; taken care of by font_range in font.c, which will check for an
 ;; emoji font for codepoints used in compositions even if they're not
 ;; emoji themselves, and thus choose the Emoji presentation for them
-;; when followed by VS-16.  VS-15 *is* handled here, because if it's
-;; handled in font_range, we end up choosing the Emoji presentation
-;; rather than the Text presentation.
+;; when followed by VS-16.  VS-15 is handled by the setup around
+;; unicode-category-table above.
 (let ((elt '([".." 1 compose-gstring-for-variation-glyph])))
-  (set-char-table-range composition-function-table '(#xFE00 . #xFE0E) elt)
+  (set-char-table-range composition-function-table '(#xFE00 . #xFE0D) elt)
   (set-char-table-range composition-function-table '(#xE0100 . #xE01EF) elt))
 
 (defun auto-compose-chars (func from to font-object string direction)



Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sun, 28 May 2023 12:38:01 GMT) Full text and rfc822 format available.

Message #68 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sun, 28 May 2023 15:37:48 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Sun, 28 May 2023 12:29:48 +0200
> 
> >>>>> On Fri, 26 May 2023 20:43:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> Actually, I don't understand why there's an issue here with font
>     Eli> selection.  Are you saying that using Noto Color Emoji with
>     Eli> CHAR+0xFE0E, when CHAR is an Emoji character, doesn't produce the
>     Eli> textual representation of CHAR?  If so, isn't that a problem with the
>     Eli> font?  I thought all we needed to do was to hand the combination to an
>     Eli> Emoji-aware font, and the font would do the rest.  Now you seem to be
>     Eli> saying that we somehow need to select a non-Emoji font?  But if so,
>     Eli> who'd guarantee that a font that cannot display Emoji will know what
>     Eli> to do with the combination CHAR+0xFE0E?
> 
> Iʼm not sure: gedit displays the text representation, and libreoffice
> displays the emoji presentation. And the google color emoji website
> only shows colour glyphs. So I think itʼs up to the application to
> select the correct font.

But what is "the correct font", when the sequence of codepoints is
CHAR+0xFE0E?  How do we identify such a font?  Do you know of a font
that produces the correct glyph for this sequence, when HarfBuzz is
used as the shaping engine?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sun, 28 May 2023 12:44:02 GMT) Full text and rfc822 format available.

Message #71 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sun, 28 May 2023 15:44:00 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Sun, 28 May 2023 13:43:13 +0200
> 
> >>>>> On Fri, 26 May 2023 21:05:47 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     >> From: Robert Pluim <rpluim <at> gmail.com>
>     >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
>     >> Date: Fri, 26 May 2023 19:35:56 +0200
>     >> 
>     >> in this case it *is* an emoji codepoint, so it displays
>     >> as emoji because of font.c, even when followed by VS-15.
> 
>     Eli> If we pass to an Emoji-capable font a sequence of a character followed
>     Eli> by VS-15, I'd expect the font to produce a glyph with the textual
>     Eli> representation of that character.
> 
> But we donʼt do that: we ask the font "give me a glyph for this codepoint".

Is that because of the composition-function-table's entry for VS-15?
Maybe we should augment that, then?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sun, 28 May 2023 12:47:01 GMT) Full text and rfc822 format available.

Message #74 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sun, 28 May 2023 15:47:11 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Sun, 28 May 2023 13:57:49 +0200
> 
> Eli, if the 20e3 changes are too much for emacs-29, I can put them in
> master.

Yeah, I think it should go to master for now.

Otherwise, LGTM, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 29 May 2023 10:46:02 GMT) Full text and rfc822 format available.

Message #77 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 29 May 2023 12:44:58 +0200
>>>>> On Sun, 28 May 2023 15:47:11 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Sun, 28 May 2023 13:57:49 +0200
    >> 
    >> Eli, if the 20e3 changes are too much for emacs-29, I can put them in
    >> master.

    Eli> Yeah, I think it should go to master for now.

I pushed the doc changes, but not the code changes, because I now
think theyʼre papering over a deeper bug (which weʼve noticed before,
but didnʼt fix then).

In all these cases, consider the sequence U+1F44D U+FE0F

- emacs-29:

    Displays as colour emoji, followed by an empty box

- emacs-29 with the following change in composite.el:

      (set-char-table-range
       composition-function-table
       #xFE0F
       `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))

    Displays as colour emoji. Much rejoicing. If I follow my own
    advice, and customize `glyphless-char-display-control' to show
    hex-boxes for variation selectors, you then see that in actual
    fact, we are still displaying the FE0F, but since it uses
    thin-space by default, it wasnʼt obvious. Much sadness.

    C-u C-x =:

                  display: composed to form "👍️" (see below)

    Composed with the following character(s) "️" using this font:
      ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-13-*-*-*-m-0-iso10646-1
    by these glyphs:
      [0 1 128077 569 16 0 17 13 4 nil]
    with these character(s):
      ️ (#xfe0f) VARIATION SELECTOR-16

Now I notice (via emoji-variation-sequences.txt), that this is only
happening for the following codepoints.

   U+1F408
   U+1F415
   U+1F426
   U+1F446
   U+1F447
   U+1F448
   U+1F449
   U+1F44D
   U+1F44E

And if I look in lisp/international/emoji-zwj.el, I find:

(#x1F44D .
,(eval-when-compile (regexp-opt
'(
"\N{U+1F44D}\N{U+1F3FB}"
"\N{U+1F44D}\N{U+1F3FC}"
"\N{U+1F44D}\N{U+1F3FD}"
"\N{U+1F44D}\N{U+1F3FE}"
"\N{U+1F44D}\N{U+1F3FF}"
))))

If I add

"\N{U+1F44D}\N{U+FE0F}"

to that, and undo the composite.el change, then everything is
fine. Hurrah! This means that the

`([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
	       [nil 0 compose-gstring-for-graphic])

is not doing the right thing for this case.

I can change the emoji-zwj.awk script to add CHAR+FE0F for all emoji,
unless someone knows how to fix composition to do the right thing
here.

(there are similar issues with CHAR+FE0E)

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 29 May 2023 13:59:02 GMT) Full text and rfc822 format available.

Message #80 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 29 May 2023 16:58:43 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 29 May 2023 12:44:58 +0200
> 
> In all these cases, consider the sequence U+1F44D U+FE0F
> 
> - emacs-29:
> 
>     Displays as colour emoji, followed by an empty box
> 
> - emacs-29 with the following change in composite.el:
> 
>       (set-char-table-range
>        composition-function-table
>        #xFE0F
>        `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))
> 
>     Displays as colour emoji. Much rejoicing. If I follow my own
>     advice, and customize `glyphless-char-display-control' to show
>     hex-boxes for variation selectors, you then see that in actual
>     fact, we are still displaying the FE0F, but since it uses
>     thin-space by default, it wasnʼt obvious. Much sadness.
> 
>     C-u C-x =:
> 
>                   display: composed to form "👍️" (see below)

This is not what I see.  I didn't use the above set-char-table-range
expression literally, but instead started "emacs -Q", and then
evaluated in *scratch*:

      (set-char-table-range
       composition-function-table
       #xFE0F
       '(["\\c.\ufe0f" 1 compose-gstring-for-graphic]))

After that, the sequence U+1F44D U+FE0F displays as a single glyph,
and there's no thin space after it.  What am I missing?  Is this
somehow specific to ftcrhb font driver or something?

> Now I notice (via emoji-variation-sequences.txt), that this is only
> happening for the following codepoints.
> 
>    U+1F408
>    U+1F415
>    U+1F426
>    U+1F446
>    U+1F447
>    U+1F448
>    U+1F449
>    U+1F44D
>    U+1F44E
> 
> And if I look in lisp/international/emoji-zwj.el, I find:
> 
> (#x1F44D .
> ,(eval-when-compile (regexp-opt
> '(
> "\N{U+1F44D}\N{U+1F3FB}"
> "\N{U+1F44D}\N{U+1F3FC}"
> "\N{U+1F44D}\N{U+1F3FD}"
> "\N{U+1F44D}\N{U+1F3FE}"
> "\N{U+1F44D}\N{U+1F3FF}"
> ))))
> 
> If I add
> 
> "\N{U+1F44D}\N{U+FE0F}"
> 
> to that, and undo the composite.el change, then everything is
> fine. Hurrah! This means that the
> 
> `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
> 	       [nil 0 compose-gstring-for-graphic])
> 
> is not doing the right thing for this case.

You are saying that the entry in composition-function-table for
U+1F44D (and other similar characters) is used in preference to the
entry for U+FE0F that follows it, even though there's no U+1F3FB
etc. after it to "steal" the composition?  Did you try stepping
through composite.c to see whether and why this is the case?

> I can change the emoji-zwj.awk script to add CHAR+FE0F for all emoji,
> unless someone knows how to fix composition to do the right thing
> here.

I think we need first to understand the issue at hand better.  There's
more here than meets the eye, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 29 May 2023 14:44:02 GMT) Full text and rfc822 format available.

Message #83 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 29 May 2023 16:43:00 +0200
>>>>> On Mon, 29 May 2023 16:58:43 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> display: composed to form "👍️" (see below)

    Eli> This is not what I see.  I didn't use the above set-char-table-range
    Eli> expression literally, but instead started "emacs -Q", and then
    Eli> evaluated in *scratch*:

    Eli>       (set-char-table-range
    Eli>        composition-function-table
    Eli>        #xFE0F
    Eli>        '(["\\c.\ufe0f" 1 compose-gstring-for-graphic]))

    Eli> After that, the sequence U+1F44D U+FE0F displays as a single glyph,
    Eli> and there's no thin space after it.  What am I missing?  Is this
    Eli> somehow specific to ftcrhb font driver or something?

Itʼs a single glyph, but that glyph contains a thin-space. I used this
to check, the second 'a' is slightly offset

👍️a
👍a

This persists if I disable harfbuzz, and it behaves the same on macOS

    Eli> You are saying that the entry in composition-function-table for
    Eli> U+1F44D (and other similar characters) is used in preference to the
    Eli> entry for U+FE0F that follows it, even though there's no U+1F3FB
    Eli> etc. after it to "steal" the composition?  Did you try stepping
    Eli> through composite.c to see whether and why this is the case?

Right. It looks the the FE0F entry is ignored. Iʼve not ventured into
composite.c yet.

    >> I can change the emoji-zwj.awk script to add CHAR+FE0F for all emoji,
    >> unless someone knows how to fix composition to do the right thing
    >> here.

    Eli> I think we need first to understand the issue at hand better.  There's
    Eli> more here than meets the eye, I think.

Absolutely

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 29 May 2023 14:56:02 GMT) Full text and rfc822 format available.

Message #86 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 29 May 2023 17:55:49 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 29 May 2023 16:43:00 +0200
> 
> >>>>> On Mon, 29 May 2023 16:58:43 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     >> display: composed to form "👍️" (see below)
> 
>     Eli> This is not what I see.  I didn't use the above set-char-table-range
>     Eli> expression literally, but instead started "emacs -Q", and then
>     Eli> evaluated in *scratch*:
> 
>     Eli>       (set-char-table-range
>     Eli>        composition-function-table
>     Eli>        #xFE0F
>     Eli>        '(["\\c.\ufe0f" 1 compose-gstring-for-graphic]))
> 
>     Eli> After that, the sequence U+1F44D U+FE0F displays as a single glyph,
>     Eli> and there's no thin space after it.  What am I missing?  Is this
>     Eli> somehow specific to ftcrhb font driver or something?
> 
> Itʼs a single glyph, but that glyph contains a thin-space. I used this
> to check, the second 'a' is slightly offset
> 
> 👍️a
> 👍a

That's because the first one shows two glyphs that are
"pseudo-composed": not by the font, but by our hand-made "composition"
in compose-gstring-for-graphic.  Try this instead:

      (set-char-table-range
       composition-function-table
       #xFE0F
       '(["\\c.\ufe0f" 1 font-shape-gstring]))

so that we only see a composition if the font indeed agrees to
compose.  What do you see?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 29 May 2023 16:14:02 GMT) Full text and rfc822 format available.

Message #89 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 29 May 2023 18:13:14 +0200
>>>>> On Mon, 29 May 2023 17:55:49 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 29 May 2023 16:43:00 +0200
    >> 
    >> >>>>> On Mon, 29 May 2023 16:58:43 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    >> >> display: composed to form "👍️" (see below)
    >> 
    Eli> This is not what I see.  I didn't use the above set-char-table-range
    Eli> expression literally, but instead started "emacs -Q", and then
    Eli> evaluated in *scratch*:
    >> 
    Eli> (set-char-table-range
    Eli> composition-function-table
    Eli> #xFE0F
    Eli> '(["\\c.\ufe0f" 1 compose-gstring-for-graphic]))
    >> 
    Eli> After that, the sequence U+1F44D U+FE0F displays as a single glyph,
    Eli> and there's no thin space after it.  What am I missing?  Is this
    Eli> somehow specific to ftcrhb font driver or something?
    >> 
    >> Itʼs a single glyph, but that glyph contains a thin-space. I used this
    >> to check, the second 'a' is slightly offset
    >> 
    >> 👍️a
    >> 👍a

    Eli> That's because the first one shows two glyphs that are
    Eli> "pseudo-composed": not by the font, but by our hand-made "composition"
    Eli> in compose-gstring-for-graphic.  Try this instead:

    Eli>       (set-char-table-range
    Eli>        composition-function-table
    Eli>        #xFE0F
    Eli>        '(["\\c.\ufe0f" 1 font-shape-gstring]))

    Eli> so that we only see a composition if the font indeed agrees to
    Eli> compose.  What do you see?

It still displays a single glyph with a thin-space. If I customize
`glyphless-char-display-control' to display hex codes for VS, then it
display a hex box.

So I guess that means weʼre not composing?

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 29 May 2023 17:19:02 GMT) Full text and rfc822 format available.

Message #92 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 29 May 2023 20:18:41 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 29 May 2023 18:13:14 +0200
> 
> >>>>> On Mon, 29 May 2023 17:55:49 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> That's because the first one shows two glyphs that are
>     Eli> "pseudo-composed": not by the font, but by our hand-made "composition"
>     Eli> in compose-gstring-for-graphic.  Try this instead:
> 
>     Eli>       (set-char-table-range
>     Eli>        composition-function-table
>     Eli>        #xFE0F
>     Eli>        '(["\\c.\ufe0f" 1 font-shape-gstring]))
> 
>     Eli> so that we only see a composition if the font indeed agrees to
>     Eli> compose.  What do you see?
> 
> It still displays a single glyph with a thin-space. If I customize
> `glyphless-char-display-control' to display hex codes for VS, then it
> display a hex box.
> 
> So I guess that means weʼre not composing?

What does "C-u C-x =" say in this case?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 30 May 2023 07:27:01 GMT) Full text and rfc822 format available.

Message #95 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 30 May 2023 09:25:52 +0200
>>>>> On Mon, 29 May 2023 20:18:41 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 29 May 2023 18:13:14 +0200
    >> 
    >> >>>>> On Mon, 29 May 2023 17:55:49 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    Eli> That's because the first one shows two glyphs that are
    Eli> "pseudo-composed": not by the font, but by our hand-made "composition"
    Eli> in compose-gstring-for-graphic.  Try this instead:
    >> 
    Eli> (set-char-table-range
    Eli> composition-function-table
    Eli> #xFE0F
    Eli> '(["\\c.\ufe0f" 1 font-shape-gstring]))
    >> 
    Eli> so that we only see a composition if the font indeed agrees to
    Eli> compose.  What do you see?
    >> 
    >> It still displays a single glyph with a thin-space. If I customize
    >> `glyphless-char-display-control' to display hex codes for VS, then it
    >> display a hex box.
    >> 
    >> So I guess that means weʼre not composing?

    Eli> What does "C-u C-x =" say in this case?

It claims itʼs composed:

             position: 146 of 251 (58%), column: 0
            character: 👍 (displayed as 👍) (codepoint 128077, #o372115, #x1f44d)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F44D
               script: emoji
               syntax: w 	which means: word
             category: .:Base
             to input: type "C-x 8 RET 1f44d" or "C-x 8 RET THUMBS UP SIGN"
          buffer code: #xF0 #x9F #x91 #x8D
            file code: #xF0 #x9F #x91 #x8D (encoded by coding system utf-8-unix)
              display: composed to form "👍️" (see below)

Composed with the following character(s) "️" using this font:
  ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 128077 569 16 0 17 13 4 nil]
with these character(s):
  ️ (#xfe0f) VARIATION SELECTOR-16

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 30 May 2023 12:11:01 GMT) Full text and rfc822 format available.

Message #98 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 30 May 2023 15:10:45 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Tue, 30 May 2023 09:25:52 +0200
> 
> >>>>> On Mon, 29 May 2023 20:18:41 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> (set-char-table-range
>     Eli> composition-function-table
>     Eli> #xFE0F
>     Eli> '(["\\c.\ufe0f" 1 font-shape-gstring]))
>     >> 
>     Eli> so that we only see a composition if the font indeed agrees to
>     Eli> compose.  What do you see?
>     >> 
>     >> It still displays a single glyph with a thin-space. If I customize
>     >> `glyphless-char-display-control' to display hex codes for VS, then it
>     >> display a hex box.
>     >> 
>     >> So I guess that means weʼre not composing?
> 
>     Eli> What does "C-u C-x =" say in this case?
> 
> It claims itʼs composed:
> 
>              position: 146 of 251 (58%), column: 0
>             character: 👍 (displayed as 👍) (codepoint 128077, #o372115, #x1f44d)
>               charset: unicode (Unicode (ISO10646))
> code point in charset: 0x1F44D
>                script: emoji
>                syntax: w 	which means: word
>              category: .:Base
>              to input: type "C-x 8 RET 1f44d" or "C-x 8 RET THUMBS UP SIGN"
>           buffer code: #xF0 #x9F #x91 #x8D
>             file code: #xF0 #x9F #x91 #x8D (encoded by coding system utf-8-unix)
>               display: composed to form "👍️" (see below)
> 
> Composed with the following character(s) "️" using this font:
>   ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-13-*-*-*-m-0-iso10646-1
> by these glyphs:
>   [0 1 128077 569 16 0 17 13 4 nil]
> with these character(s):
>   ️ (#xfe0f) VARIATION SELECTOR-16

Which means it _is_ composed.  Moreover, with Noto Color Emoji we get
a single glyph.  On my system, I have Noto Emoji, from which I get two
glyphs:

  [0 1 128077 422 17 1 15 12 2 nil]
  [0 1 65039 3 17 0 1 0 1 [0 0 0]]

(in which case I can understand why the second one is displayed as a
hex box if I customize glyphless-char-display-control).

So, given that this is the case, why is this wrong, again?  If the
font and the shaper produce two glyphs, or one glyph that looks like
two, why should we think it's an Emacs's problem?

(I verified that Emacs 28 shows the same, so this is not a recent
regression.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 30 May 2023 13:32:02 GMT) Full text and rfc822 format available.

Message #101 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 30 May 2023 15:30:58 +0200
[Message part 1 (text/plain, inline)]
>>>>> On Tue, 30 May 2023 15:10:45 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    Eli> Which means it _is_ composed.  Moreover, with Noto Color Emoji we get
    Eli> a single glyph.  On my system, I have Noto Emoji, from which I get two
    Eli> glyphs:

    Eli>   [0 1 128077 422 17 1 15 12 2 nil]
    Eli>   [0 1 65039 3 17 0 1 0 1 [0 0 0]]

    Eli> (in which case I can understand why the second one is displayed as a
    Eli> hex box if I customize glyphless-char-display-control).

But I also get a hex box if I customize
glyphless-char-display-control, even though 'C-u C-x =' claims thereʼs
only one glyph.

    Eli> So, given that this is the case, why is this wrong, again?  If the
    Eli> font and the shaper produce two glyphs, or one glyph that looks like
    Eli> two, why should we think it's an Emacs's problem?

Because Emacs behaves differently depending on whether we have a
composition rule for FE0F that looks backwards or one for 1F44D that
looks forwards. The sequence in both cases is

U+1F44D U+FE0F U+7C U+61
U+1F44D U+7C U+61

(set-char-table-range
 composition-function-table
 #xFE0F
 '(["\\c.\ufe0f" 1 font-shape-gstring]))

produces the following:

[backward-composition.png (image/png, inline)]
[Message part 3 (text/plain, inline)]
There is a (very) thin space that shouldnʼt be there between the 1f44d
and the '|' on the line that has the FE0F (and since it follows the
value of glyphless-char-display-control, I donʼt think
it comes from the shaping engine).

but

(set-char-table-range
 composition-function-table
 #x1F44D 
'(["\U0001f44d\ufe0f" 0 font-shape-gstring]))

gives me this, where the two '|' align perfectly.

[forward-composition.png (image/png, inline)]
[Message part 5 (text/plain, inline)]
(as an experiment, I hacked 'produce_glyphless_glyph' to skip
displaying variation selectors, and the problem disappears).

thanks

Robert

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 30 May 2023 16:32:01 GMT) Full text and rfc822 format available.

Message #104 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 30 May 2023 19:32:23 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Tue, 30 May 2023 15:30:58 +0200
> 
> >>>>> On Tue, 30 May 2023 15:10:45 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> Which means it _is_ composed.  Moreover, with Noto Color Emoji we get
>     Eli> a single glyph.  On my system, I have Noto Emoji, from which I get two
>     Eli> glyphs:
> 
>     Eli>   [0 1 128077 422 17 1 15 12 2 nil]
>     Eli>   [0 1 65039 3 17 0 1 0 1 [0 0 0]]
> 
>     Eli> (in which case I can understand why the second one is displayed as a
>     Eli> hex box if I customize glyphless-char-display-control).
> 
> But I also get a hex box if I customize
> glyphless-char-display-control, even though 'C-u C-x =' claims thereʼs
> only one glyph.
> 
>     Eli> So, given that this is the case, why is this wrong, again?  If the
>     Eli> font and the shaper produce two glyphs, or one glyph that looks like
>     Eli> two, why should we think it's an Emacs's problem?
> 
> Because Emacs behaves differently depending on whether we have a
> composition rule for FE0F that looks backwards or one for 1F44D that
> looks forwards. The sequence in both cases is
> 
> U+1F44D U+FE0F U+7C U+61
> U+1F44D U+7C U+61
> 
> (set-char-table-range
>  composition-function-table
>  #xFE0F
>  '(["\\c.\ufe0f" 1 font-shape-gstring]))
> 
> produces the following:
> 
> There is a (very) thin space that shouldnʼt be there between the 1f44d
> and the '|' on the line that has the FE0F (and since it follows the
> value of glyphless-char-display-control, I donʼt think
> it comes from the shaping engine).

OK, here's the scoop: there's no composition there.  "C-u C-x =" says
there is, but that's a lie: when I look in GDB at the glyphs actually
shown there, there's no composition glyphs, only the glyph for U+1F44D
followed by a glyph for U+FE0F.

> but
> 
> (set-char-table-range
>  composition-function-table
>  #x1F44D 
> '(["\U0001f44d\ufe0f" 0 font-shape-gstring]))
> 
> gives me this, where the two '|' align perfectly.

Here, there _is_ a composition.

So there are two issues here: (a) why there's no composition in the
first case, and (b) why does "C-u C-x =" says there is when there
isn't.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Wed, 31 May 2023 16:12:02 GMT) Full text and rfc822 format available.

Message #107 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Wed, 31 May 2023 18:11:36 +0200
>>>>> On Tue, 30 May 2023 19:32:23 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> (set-char-table-range
    >> composition-function-table
    >> #x1F44D 
    >> '(["\U0001f44d\ufe0f" 0 font-shape-gstring]))
    >> 
    >> gives me this, where the two '|' align perfectly.

    Eli> Here, there _is_ a composition.

    Eli> So there are two issues here: (a) why there's no composition in the
    Eli> first case, and (b) why does "C-u C-x =" says there is when there
    Eli> isn't.

OK. I can poke around in gdb if you give me some idea of what I should
be looking at.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Wed, 31 May 2023 16:18:02 GMT) Full text and rfc822 format available.

Message #110 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Wed, 31 May 2023 19:18:22 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Wed, 31 May 2023 18:11:36 +0200
> 
> >>>>> On Tue, 30 May 2023 19:32:23 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     >> (set-char-table-range
>     >> composition-function-table
>     >> #x1F44D 
>     >> '(["\U0001f44d\ufe0f" 0 font-shape-gstring]))
>     >> 
>     >> gives me this, where the two '|' align perfectly.
> 
>     Eli> Here, there _is_ a composition.
> 
>     Eli> So there are two issues here: (a) why there's no composition in the
>     Eli> first case, and (b) why does "C-u C-x =" says there is when there
>     Eli> isn't.
> 
> OK. I can poke around in gdb if you give me some idea of what I should
> be looking at.

I don't really know.  I plan to just step through the code in
composite.c tomorrow, unless you beat me to it.  Once we understand
issue (a), I think we will also understand issue (b).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Thu, 01 Jun 2023 12:43:02 GMT) Full text and rfc822 format available.

Message #113 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rpluim <at> gmail.com
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Thu, 01 Jun 2023 15:43:26 +0300
> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
> Date: Wed, 31 May 2023 19:18:22 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > From: Robert Pluim <rpluim <at> gmail.com>
> > Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> > Date: Wed, 31 May 2023 18:11:36 +0200
> > 
> >     Eli> So there are two issues here: (a) why there's no composition in the
> >     Eli> first case, and (b) why does "C-u C-x =" says there is when there
> >     Eli> isn't.
> > 
> > OK. I can poke around in gdb if you give me some idea of what I should
> > be looking at.
> 
> I don't really know.  I plan to just step through the code in
> composite.c tomorrow, unless you beat me to it.  Once we understand
> issue (a), I think we will also understand issue (b).

OK, the issue is quite clear even without stepping with a debugger.

Bottom line: we cannot support a situation where the same character
can be composed by more than one slot in composition-function-table.
If there are more than a single slot for the same character, one of
them will be tried, and the rest will be ignored (not even tried).
In particular, if a character CH has a "forward" composition rule that
starts with itself, and also has a "backward" rule (one with non-zero
look-back parameter) triggered by a different character (which should
follow CH), the latter rule will never be tried.

This is what happens in this case: the character #x1F44D has several
rules that start with itself in emoji-zwj.el:

  (#x1F44D .
  ,(eval-when-compile (regexp-opt
   '(
   "\N{U+1F44D}\N{U+1F3FB}"
   "\N{U+1F44D}\N{U+1F3FC}"
   "\N{U+1F44D}\N{U+1F3FD}"
   "\N{U+1F44D}\N{U+1F3FE}"
   "\N{U+1F44D}\N{U+1F3FF}"
   ))))

and it also has a "backward" rule:

  (set-char-table-range
   composition-function-table
   #xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring]))

The latter is triggered by #xFE0F and has a 1-character look-back,
which will match #x1F44D, since its category is '.' (it's a "base
character").  This latter rule is never tried.  Why? because the
former rules, anchored at #X1F44D, are tried first (Emacs redisplay
examines characters in the order of their buffer positions), and fail
to match.  When those rules fail to match, due to how the
composition-related functions called by the display engine are
factored, we never again consider compositions triggered by a later
character which "cover" also #x1F44D: once that position was examined
and the attempted composition failed, we move to the next character.
IOW, we assume that this first set of composition rules we find for a
given character are the only ones that could possibly be relevant for
that character.

Which means that to have #xFE0F compose correctly with Emoji
codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.

The reason why "C-u C-x =" lies to us saying there's a composition
where really there isn't is because descr-text.el uses the
find-composition primitive, whose implementation is parallel and
separate from that of the display-engine routines, and is structured
differently.  So find-composition does succeed to detect the second
rule, the one triggered by #xFE0F, which the display engine ignores.
I will think whether this can be fixed, to avoid such false positives,
but if we accept that there can be only one set of composition rules
for a character, then we basically invoked undefined behavior here,
and we got what we deserved.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Thu, 01 Jun 2023 13:31:03 GMT) Full text and rfc822 format available.

Message #116 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Thu, 01 Jun 2023 15:30:18 +0200
>>>>> On Thu, 01 Jun 2023 15:43:26 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
    >> Date: Wed, 31 May 2023 19:18:22 +0300
    >> From: Eli Zaretskii <eliz <at> gnu.org>
    >> 
    >> > From: Robert Pluim <rpluim <at> gmail.com>
    >> > Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> > Date: Wed, 31 May 2023 18:11:36 +0200
    >> > 
    >> >     Eli> So there are two issues here: (a) why there's no composition in the
    >> >     Eli> first case, and (b) why does "C-u C-x =" says there is when there
    >> >     Eli> isn't.
    >> > 
    >> > OK. I can poke around in gdb if you give me some idea of what I should
    >> > be looking at.
    >> 
    >> I don't really know.  I plan to just step through the code in
    >> composite.c tomorrow, unless you beat me to it.  Once we understand
    >> issue (a), I think we will also understand issue (b).

    Eli> OK, the issue is quite clear even without stepping with a debugger.

    Eli> Bottom line: we cannot support a situation where the same character
    Eli> can be composed by more than one slot in composition-function-table.
    Eli> If there are more than a single slot for the same character, one of
    Eli> them will be tried, and the rest will be ignored (not even tried).
    Eli> In particular, if a character CH has a "forward" composition rule that
    Eli> starts with itself, and also has a "backward" rule (one with non-zero
    Eli> look-back parameter) triggered by a different character (which should
    Eli> follow CH), the latter rule will never be tried.

OK, that makes sense. Where would be a good place to document this?

    Eli> This is what happens in this case: the character #x1F44D has several
    Eli> rules that start with itself in emoji-zwj.el:

    Eli>   (#x1F44D .
    Eli>   ,(eval-when-compile (regexp-opt
    Eli>    '(
    Eli>    "\N{U+1F44D}\N{U+1F3FB}"
    Eli>    "\N{U+1F44D}\N{U+1F3FC}"
    Eli>    "\N{U+1F44D}\N{U+1F3FD}"
    Eli>    "\N{U+1F44D}\N{U+1F3FE}"
    Eli>    "\N{U+1F44D}\N{U+1F3FF}"
    Eli>    ))))

    Eli> and it also has a "backward" rule:

    Eli>   (set-char-table-range
    Eli>    composition-function-table
    Eli>    #xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring]))

    Eli> The latter is triggered by #xFE0F and has a 1-character look-back,
    Eli> which will match #x1F44D, since its category is '.' (it's a "base
    Eli> character").  This latter rule is never tried.  Why? because the
    Eli> former rules, anchored at #X1F44D, are tried first (Emacs redisplay
    Eli> examines characters in the order of their buffer positions), and fail
    Eli> to match.  When those rules fail to match, due to how the
    Eli> composition-related functions called by the display engine are
    Eli> factored, we never again consider compositions triggered by a later
    Eli> character which "cover" also #x1F44D: once that position was examined
    Eli> and the attempted composition failed, we move to the next character.
    Eli> IOW, we assume that this first set of composition rules we find for a
    Eli> given character are the only ones that could possibly be relevant for
    Eli> that character.

    Eli> Which means that to have #xFE0F compose correctly with Emoji
    Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.

Thatʼs easy enough:

diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk
index 7d2ff6cb900..d1195ebbad8 100644
--- a/admin/unidata/emoji-zwj.awk
+++ b/admin/unidata/emoji-zwj.awk
@@ -106,7 +106,8 @@ END {
 
      for (elt in ch)
     {
-        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, vec[elt])
+        entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt)
+        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, entries)
     }
      print "))"
      print "  (set-char-table-range composition-function-table"

That makes all the VS-16 sequences in
admin/unidata/emoji-variation-sequences.txt display with the emoji
font for me.

    Eli> The reason why "C-u C-x =" lies to us saying there's a composition
    Eli> where really there isn't is because descr-text.el uses the
    Eli> find-composition primitive, whose implementation is parallel and
    Eli> separate from that of the display-engine routines, and is structured
    Eli> differently.  So find-composition does succeed to detect the second
    Eli> rule, the one triggered by #xFE0F, which the display engine ignores.
    Eli> I will think whether this can be fixed, to avoid such false positives,
    Eli> but if we accept that there can be only one set of composition rules
    Eli> for a character, then we basically invoked undefined behavior here,
    Eli> and we got what we deserved.

If find-composition DTRT, could we not use it in the display engine?

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Thu, 01 Jun 2023 16:10:02 GMT) Full text and rfc822 format available.

Message #119 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Thu, 01 Jun 2023 19:10:16 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Thu, 01 Jun 2023 15:30:18 +0200
> 
>     Eli> OK, the issue is quite clear even without stepping with a debugger.
> 
>     Eli> Bottom line: we cannot support a situation where the same character
>     Eli> can be composed by more than one slot in composition-function-table.
>     Eli> If there are more than a single slot for the same character, one of
>     Eli> them will be tried, and the rest will be ignored (not even tried).
>     Eli> In particular, if a character CH has a "forward" composition rule that
>     Eli> starts with itself, and also has a "backward" rule (one with non-zero
>     Eli> look-back parameter) triggered by a different character (which should
>     Eli> follow CH), the latter rule will never be tried.
> 
> OK, that makes sense. Where would be a good place to document this?

In the doc string of composition-function-table, I think.  We already
document there the caveat of arranging rules in descending order of
look-back, which is part of the same "misfeature".

>     Eli> Which means that to have #xFE0F compose correctly with Emoji
>     Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.
> 
> Thatʼs easy enough:
> 
> diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk
> index 7d2ff6cb900..d1195ebbad8 100644
> --- a/admin/unidata/emoji-zwj.awk
> +++ b/admin/unidata/emoji-zwj.awk
> @@ -106,7 +106,8 @@ END {
>  
>       for (elt in ch)
>      {
> -        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, vec[elt])
> +        entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt)
> +        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, entries)
>      }
>       print "))"
>       print "  (set-char-table-range composition-function-table"
> 
> That makes all the VS-16 sequences in
> admin/unidata/emoji-variation-sequences.txt display with the emoji
> font for me.

Ready to install this on the emacs-29 branch?

>     Eli> The reason why "C-u C-x =" lies to us saying there's a composition
>     Eli> where really there isn't is because descr-text.el uses the
>     Eli> find-composition primitive, whose implementation is parallel and
>     Eli> separate from that of the display-engine routines, and is structured
>     Eli> differently.  So find-composition does succeed to detect the second
>     Eli> rule, the one triggered by #xFE0F, which the display engine ignores.
>     Eli> I will think whether this can be fixed, to avoid such false positives,
>     Eli> but if we accept that there can be only one set of composition rules
>     Eli> for a character, then we basically invoked undefined behavior here,
>     Eli> and we got what we deserved.
> 
> If find-composition DTRT, could we not use it in the display engine?

Not easily, because the display code calls subroutines of
find-composition in a certain order, and that's what causes the
behavior I described.

And even if we could make this happen, I'm not sure we should:
basically, having multiple matching slots would mean users and callers
will never be sure which one "wins".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Thu, 01 Jun 2023 16:36:02 GMT) Full text and rfc822 format available.

Message #122 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Thu, 01 Jun 2023 18:34:53 +0200
>>>>> On Thu, 01 Jun 2023 19:10:16 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Thu, 01 Jun 2023 15:30:18 +0200
    >> 
    Eli> OK, the issue is quite clear even without stepping with a debugger.
    >> 
    Eli> Bottom line: we cannot support a situation where the same character
    Eli> can be composed by more than one slot in composition-function-table.
    Eli> If there are more than a single slot for the same character, one of
    Eli> them will be tried, and the rest will be ignored (not even tried).
    Eli> In particular, if a character CH has a "forward" composition rule that
    Eli> starts with itself, and also has a "backward" rule (one with non-zero
    Eli> look-back parameter) triggered by a different character (which should
    Eli> follow CH), the latter rule will never be tried.
    >> 
    >> OK, that makes sense. Where would be a good place to document this?

    Eli> In the doc string of composition-function-table, I think.  We already
    Eli> document there the caveat of arranging rules in descending order of
    Eli> look-back, which is part of the same "misfeature".

OK. Iʼll see if I can come up with something (or Iʼll just steal what
you wrote above :-)).

    >> That makes all the VS-16 sequences in
    >> admin/unidata/emoji-variation-sequences.txt display with the emoji
    >> font for me.

    Eli> Ready to install this on the emacs-29 branch?

Not today. My brain is fuzzy, and it needs more testing (the patch,
not my brain).

    >> If find-composition DTRT, could we not use it in the display engine?

    Eli> Not easily, because the display code calls subroutines of
    Eli> find-composition in a certain order, and that's what causes the
    Eli> behavior I described.

    Eli> And even if we could make this happen, I'm not sure we should:
    Eli> basically, having multiple matching slots would mean users and callers
    Eli> will never be sure which one "wins".

Yes, at least the semantics are clear (now that we know what they
are).

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 02 Jun 2023 08:16:01 GMT) Full text and rfc822 format available.

Message #125 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 02 Jun 2023 10:15:08 +0200
>>>>> On Thu, 01 Jun 2023 18:34:53 +0200, Robert Pluim <rpluim <at> gmail.com> said:

    Eli> Ready to install this on the emacs-29 branch?

    Robert> Not today. My brain is fuzzy, and it needs more testing (the patch,
    Robert> not my brain).

So the minimal change to get CHAR+VS-15 and CHAR+VS-16 to compose in
all our emoji test files is below. I noticed that we donʼt compose all
the sequences in emoji-test.txt correctly, but Iʼll fix that on master
by stealing^Wdrawing inspiration from Larsʼ work.

Proper VS-15 support is harder, I need to think about that some more.

diff --git c/admin/unidata/emoji-zwj.awk i/admin/unidata/emoji-zwj.awk
index 7d2ff6cb900..f13f796bcac 100644
--- c/admin/unidata/emoji-zwj.awk
+++ i/admin/unidata/emoji-zwj.awk
@@ -106,7 +106,8 @@ END {
 
      for (elt in ch)
     {
-        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, vec[elt])
+        entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0E}\"\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt, elt)
+        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, entries)
     }
      print "))"
      print "  (set-char-table-range composition-function-table"
diff --git c/lisp/composite.el i/lisp/composite.el
index fb8b76114f4..9710c3c371b 100644
--- c/lisp/composite.el
+++ i/lisp/composite.el
@@ -861,7 +861,7 @@ compose-gstring-for-variation-glyph
 ;; handled in font_range, we end up choosing the Emoji presentation
 ;; rather than the Text presentation.
 (let ((elt '([".." 1 compose-gstring-for-variation-glyph])))
-  (set-char-table-range composition-function-table '(#xFE00 . #xFE0E) elt)
+  (set-char-table-range composition-function-table '(#xFE00 . #xFE0D) elt)
   (set-char-table-range composition-function-table '(#xE0100 . #xE01EF) elt))
 
 (defun auto-compose-chars (func from to font-object string direction)


Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 02 Jun 2023 12:07:02 GMT) Full text and rfc822 format available.

Message #128 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 02 Jun 2023 15:06:32 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Fri, 02 Jun 2023 10:15:08 +0200
> 
> >>>>> On Thu, 01 Jun 2023 18:34:53 +0200, Robert Pluim <rpluim <at> gmail.com> said:
> 
>     Eli> Ready to install this on the emacs-29 branch?
> 
>     Robert> Not today. My brain is fuzzy, and it needs more testing (the patch,
>     Robert> not my brain).
> 
> So the minimal change to get CHAR+VS-15 and CHAR+VS-16 to compose in
> all our emoji test files is below. I noticed that we donʼt compose all
> the sequences in emoji-test.txt correctly, but Iʼll fix that on master
> by stealing^Wdrawing inspiration from Larsʼ work.

Thanks, please install this on the emacs-29 branch.

> Proper VS-15 support is harder, I need to think about that some more.

Can you describe here the current problems with VS-15?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 02 Jun 2023 12:26:02 GMT) Full text and rfc822 format available.

Message #131 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 02 Jun 2023 14:25:28 +0200
tags 63731 fixed
close 63731 29.1
quit

>>>>> On Fri, 02 Jun 2023 15:06:32 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    Eli> Thanks, please install this on the emacs-29 branch.

Closing.
Committed as 2f94f6de9d6

    >> Proper VS-15 support is harder, I need to think about that some more.

    Eli> Can you describe here the current problems with VS-15?

CHAR+VS-15 and CHAR+VS-16 correctly choose text and emoji
representation, but CHAR+VS-15 results in the text representation only
if CHAR is not an emoji. If it is an emoji, the font selected for it
will always be the emoji font.

Iʼve tried forcing font_range to use the font for the 'symbol' script
for EMOJI+VS-15, instead, but that resulted in composition
failing. Maybe there are some more dragons lurking in the composition
rules.

Robert
-- 




Added tag(s) fixed. Request was from Robert Pluim <rpluim <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 02 Jun 2023 12:26:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 29.1, send any further explanations to 63731 <at> debbugs.gnu.org and Steven Allen <steven <at> stebalien.com> Request was from Robert Pluim <rpluim <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 02 Jun 2023 12:26:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 02 Jun 2023 12:58:01 GMT) Full text and rfc822 format available.

Message #138 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 02 Jun 2023 15:58:05 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Fri, 02 Jun 2023 14:25:28 +0200
> 
>     Eli> Thanks, please install this on the emacs-29 branch.
> 
> Closing.
> Committed as 2f94f6de9d6

Thanks.

>     >> Proper VS-15 support is harder, I need to think about that some more.
> 
>     Eli> Can you describe here the current problems with VS-15?
> 
> CHAR+VS-15 and CHAR+VS-16 correctly choose text and emoji
> representation, but CHAR+VS-15 results in the text representation only
> if CHAR is not an emoji. If it is an emoji, the font selected for it
> will always be the emoji font.

And an Emoji font, when presented with CHAR+VS-15 sequence doesn't
produce a textual-representation glyph for CHAR?  I'd expect it to.

If Emoji fonts don't produce textual-representation glyphs in this
case, I wonder how can this work at all.  Because if we select some
non-Emoji font, it will probably not know about VS-15, so we will be
left with VS-15.  Are we supposed to handle that ourselves, instead of
relying on the font and the shaping engine?

> Iʼve tried forcing font_range to use the font for the 'symbol' script
> for EMOJI+VS-15, instead, but that resulted in composition
> failing.

That's what I'd expect: non-Emoji fonts don't know about VS-15.

What does HarfBuzz's hb-view do with such sequences, when using Noto
Color Emoji font?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Fri, 02 Jun 2023 13:59:01 GMT) Full text and rfc822 format available.

Message #141 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Fri, 02 Jun 2023 15:58:37 +0200
>>>>> On Fri, 02 Jun 2023 15:58:05 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> CHAR+VS-15 and CHAR+VS-16 correctly choose text and emoji
    >> representation, but CHAR+VS-15 results in the text representation only
    >> if CHAR is not an emoji. If it is an emoji, the font selected for it
    >> will always be the emoji font.

    Eli> And an Emoji font, when presented with CHAR+VS-15 sequence doesn't
    Eli> produce a textual-representation glyph for CHAR?  I'd expect it to.

No.

    Eli> If Emoji fonts don't produce textual-representation glyphs in this
    Eli> case, I wonder how can this work at all.  Because if we select some
    Eli> non-Emoji font, it will probably not know about VS-15, so we will be
    Eli> left with VS-15.  Are we supposed to handle that ourselves, instead of
    Eli> relying on the font and the shaping engine?

    >> Iʼve tried forcing font_range to use the font for the 'symbol' script
    >> for EMOJI+VS-15, instead, but that resulted in composition
    >> failing.

Itʼs finding what appears to be the default system font, not whatʼs
specified in the fontset for 'symbol', so thatʼs one reason why
composition fails. Even with 'use-default-font-for-symbols' nil.

    Eli> That's what I'd expect: non-Emoji fonts don't know about VS-15.

Right

    Eli> What does HarfBuzz's hb-view do with such sequences, when using Noto
    Eli> Color Emoji font?

Sequence       Font             Result
23e9 fe0e      system           black box
23e9 fe0e      Symbola          correct text representation
23e9 fe0e      NotoEmoji        correct text representation
23e9 fe0e      NotoColorEmoji   blank

And on emacs-29, Symbola and NotoEmoji compose that sequence
correctly. Now I just need to persuade emacs-30 to use one of them.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Sat, 03 Jun 2023 05:37:02 GMT) Full text and rfc822 format available.

Message #144 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Sat, 03 Jun 2023 08:36:59 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Fri, 02 Jun 2023 15:58:37 +0200
> 
>     Eli> What does HarfBuzz's hb-view do with such sequences, when using Noto
>     Eli> Color Emoji font?
> 
> Sequence       Font             Result
> 23e9 fe0e      system           black box
> 23e9 fe0e      Symbola          correct text representation
> 23e9 fe0e      NotoEmoji        correct text representation
> 23e9 fe0e      NotoColorEmoji   blank
> 
> And on emacs-29, Symbola and NotoEmoji compose that sequence
> correctly. Now I just need to persuade emacs-30 to use one of them.

So you are saying that, in our default fontset, we should specify that
#xFE0E should be displayed by Noto Emoji (with Symbola as fallback),
and then make sure that font_range uses the same font for the likes of
#x23E9?  IOW, specify a different font for VS-15 even though is script
is 'emoji'?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 13:09:01 GMT) Full text and rfc822 format available.

Message #147 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 15:08:08 +0200
>>>>> On Sat, 03 Jun 2023 08:36:59 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Fri, 02 Jun 2023 15:58:37 +0200
    >> 
    Eli> What does HarfBuzz's hb-view do with such sequences, when using Noto
    Eli> Color Emoji font?
    >> 
    >> Sequence       Font             Result
    >> 23e9 fe0e      system           black box
    >> 23e9 fe0e      Symbola          correct text representation
    >> 23e9 fe0e      NotoEmoji        correct text representation
    >> 23e9 fe0e      NotoColorEmoji   blank
    >> 
    >> And on emacs-29, Symbola and NotoEmoji compose that sequence
    >> correctly. Now I just need to persuade emacs-30 to use one of them.

    Eli> So you are saying that, in our default fontset, we should specify that
    Eli> #xFE0E should be displayed by Noto Emoji (with Symbola as fallback),
    Eli> and then make sure that font_range uses the same font for the likes of
    Eli> #x23E9?  IOW, specify a different font for VS-15 even though is script
    Eli> is 'emoji'?

Yes, that works (and we can remove VS-15 and VS-16 from the emoji
script, so that theyʼll then be displayed via
`glyphless-char-display-control' when theyʼre on their own).

Thanks for the suggestion Eli, I was looking at it from the wrong
direction.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 13:13:02 GMT) Full text and rfc822 format available.

Message #150 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 16:12:20 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 05 Jun 2023 15:08:08 +0200
> 
> >>>>> On Sat, 03 Jun 2023 08:36:59 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     >> Sequence       Font             Result
>     >> 23e9 fe0e      system           black box
>     >> 23e9 fe0e      Symbola          correct text representation
>     >> 23e9 fe0e      NotoEmoji        correct text representation
>     >> 23e9 fe0e      NotoColorEmoji   blank
>     >> 
>     >> And on emacs-29, Symbola and NotoEmoji compose that sequence
>     >> correctly. Now I just need to persuade emacs-30 to use one of them.
> 
>     Eli> So you are saying that, in our default fontset, we should specify that
>     Eli> #xFE0E should be displayed by Noto Emoji (with Symbola as fallback),
>     Eli> and then make sure that font_range uses the same font for the likes of
>     Eli> #x23E9?  IOW, specify a different font for VS-15 even though is script
>     Eli> is 'emoji'?
> 
> Yes, that works (and we can remove VS-15 and VS-16 from the emoji
> script, so that theyʼll then be displayed via
> `glyphless-char-display-control' when theyʼre on their own).

What about the rest of VS-nn? do they need to stay in 'emoji' script,
and if so, why?

> Thanks for the suggestion Eli, I was looking at it from the wrong
> direction.

You are the one who did most of the footwork, so kudos to you.

This is simple enough to install on emacs-29, I think?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 13:33:01 GMT) Full text and rfc822 format available.

Message #153 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rpluim <at> gmail.com
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 16:31:58 +0300
> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
> Date: Mon, 05 Jun 2023 16:12:20 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> >     Eli> So you are saying that, in our default fontset, we should specify that
> >     Eli> #xFE0E should be displayed by Noto Emoji (with Symbola as fallback),
> >     Eli> and then make sure that font_range uses the same font for the likes of
> >     Eli> #x23E9?  IOW, specify a different font for VS-15 even though is script
> >     Eli> is 'emoji'?
> > 
> > Yes, that works (and we can remove VS-15 and VS-16 from the emoji
> > script, so that theyʼll then be displayed via
> > `glyphless-char-display-control' when theyʼre on their own).
> 
> What about the rest of VS-nn? do they need to stay in 'emoji' script,
> and if so, why?

And one more question: if we remove VS-16 from the emoji script, what
will happen to the sequences like U+23E9 U+FE0F?  Isn't it true that
we use a color Emoji font for those because VS-16 is in emoji script?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 13:37:02 GMT) Full text and rfc822 format available.

Message #156 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 15:36:52 +0200
>>>>> On Mon, 05 Jun 2023 16:12:20 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 05 Jun 2023 15:08:08 +0200
    >> 
    >> >>>>> On Sat, 03 Jun 2023 08:36:59 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    >> >> Sequence       Font             Result
    >> >> 23e9 fe0e      system           black box
    >> >> 23e9 fe0e      Symbola          correct text representation
    >> >> 23e9 fe0e      NotoEmoji        correct text representation
    >> >> 23e9 fe0e      NotoColorEmoji   blank
    >> >> 
    >> >> And on emacs-29, Symbola and NotoEmoji compose that sequence
    >> >> correctly. Now I just need to persuade emacs-30 to use one of them.
    >> 
    Eli> So you are saying that, in our default fontset, we should specify that
    Eli> #xFE0E should be displayed by Noto Emoji (with Symbola as fallback),
    Eli> and then make sure that font_range uses the same font for the likes of
    Eli> #x23E9?  IOW, specify a different font for VS-15 even though is script
    Eli> is 'emoji'?
    >> 
    >> Yes, that works (and we can remove VS-15 and VS-16 from the emoji
    >> script, so that theyʼll then be displayed via
    >> `glyphless-char-display-control' when theyʼre on their own).

    Eli> What about the rest of VS-nn? do they need to stay in 'emoji' script,
    Eli> and if so, why?

They were never in the 'emoji' script anyway.

    >> Thanks for the suggestion Eli, I was looking at it from the wrong
    >> direction.

    Eli> You are the one who did most of the footwork, so kudos to you.

    Eli> This is simple enough to install on emacs-29, I think?

The main change is in font.c, and looks like this. I think itʼs too
big for emacs-29 (breaking composition is very easy, itʼs entirely
possible Iʼve missed a few cases :-) )

diff --git a/src/font.c b/src/font.c
index e586277a5d3..30b088c818e 100644
--- a/src/font.c
+++ b/src/font.c
@@ -3633,10 +3633,14 @@ font_at (int c, ptrdiff_t pos, struct face *face, struct window *w,
 /* Check if CH is a codepoint for which we should attempt to use the
    emoji font, even if the codepoint itself has Emoji_Presentation =
    No.  Vauto_composition_emoji_eligible_codepoints is filled in for
-   us by admin/unidata/emoji-zwj.awk.  */
+   us by admin/unidata/emoji-zwj.awk.  We also check if there's a
+   VS-15 or VS-16 following CH, and select text/emoji presentation
+   respectively if so.  */
 static bool
-codepoint_is_emoji_eligible (int ch)
+codepoint_is_font_change_eligible (int ch, int next_c)
 {
+  if (next_c == 0xFE0E || next_c == 0xFE0F)
+    return true;
   if (EQ (CHAR_TABLE_REF (Vchar_script_table, ch), Qemoji))
     return true;
 
@@ -3690,21 +3694,43 @@ font_range (ptrdiff_t pos, ptrdiff_t pos_byte, ptrdiff_t *limit,
 	}
       face = FACE_FROM_ID (f, face_id);
     }
-
-  /* If the composition was triggered by an emoji, use a character
-     from 'script-representative-chars', rather than the first
-     character in the string, to determine the font to use.  */
-  if (codepoint_is_emoji_eligible (ch))
+  int next_c = 0;
+  {
+    ptrdiff_t p = pos;
+    ptrdiff_t p_b = pos_byte;
+    int c;
+    c = (NILP (string)
+	 ? fetch_char_advance_no_check (&p, &p_b)
+	 : fetch_string_char_advance_no_check (string, &p, &p_b));
+    if (p < *limit)
+      {
+	c = (NILP (string)
+	     ? fetch_char_advance_no_check (&p, &p_b)
+	     : fetch_string_char_advance_no_check (string, &p, &p_b));
+	next_c = c;
+      }
+  }
+  if (codepoint_is_font_change_eligible (ch, next_c))
     {
-      Lisp_Object val = assq_no_quit (Qemoji, Vscript_representative_chars);
-      if (CONSP (val))
+      if (next_c == 0xFE0E)
 	{
-	  val = XCDR (val);
+	  font_object = font_for_char (face, 0xFE0E, pos, string);
+	}
+      else
+	{
+	  /* If the composition was triggered by an emoji, use a character
+	     from 'script-representative-chars', rather than the first
+	     character in the string, to determine the font to use.  */
+	  Lisp_Object val = assq_no_quit (Qemoji, Vscript_representative_chars);
 	  if (CONSP (val))
-	    val = XCAR (val);
-	  else if (VECTORP (val))
-	    val = AREF (val, 0);
-	  font_object = font_for_char (face, XFIXNAT (val), pos, string);
+	    {
+	      val = XCDR (val);
+	      if (CONSP (val))
+		val = XCAR (val);
+	      else if (VECTORP (val))
+		val = AREF (val, 0);
+	      font_object = font_for_char (face, XFIXNAT (val), pos, string);
+	    }
 	}
     }
 

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 13:48:02 GMT) Full text and rfc822 format available.

Message #159 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 16:47:22 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 05 Jun 2023 15:36:52 +0200
> 
> >>>>> On Mon, 05 Jun 2023 16:12:20 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> This is simple enough to install on emacs-29, I think?
> 
> The main change is in font.c, and looks like this. I think itʼs too
> big for emacs-29 (breaking composition is very easy, itʼs entirely
> possible Iʼve missed a few cases :-) )

Hmm... I though just changing the fontset in fontset.el would be
enough.

OK, so I guess master it is, then.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 14:07:01 GMT) Full text and rfc822 format available.

Message #162 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 16:06:39 +0200
>>>>> On Mon, 05 Jun 2023 16:31:58 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
    >> Date: Mon, 05 Jun 2023 16:12:20 +0300
    >> From: Eli Zaretskii <eliz <at> gnu.org>
    >> 
    >> >     Eli> So you are saying that, in our default fontset, we should specify that
    >> >     Eli> #xFE0E should be displayed by Noto Emoji (with Symbola as fallback),
    >> >     Eli> and then make sure that font_range uses the same font for the likes of
    >> >     Eli> #x23E9?  IOW, specify a different font for VS-15 even though is script
    >> >     Eli> is 'emoji'?
    >> > 
    >> > Yes, that works (and we can remove VS-15 and VS-16 from the emoji
    >> > script, so that theyʼll then be displayed via
    >> > `glyphless-char-display-control' when theyʼre on their own).
    >> 
    >> What about the rest of VS-nn? do they need to stay in 'emoji' script,
    >> and if so, why?

    Eli> And one more question: if we remove VS-16 from the emoji script, what
    Eli> will happen to the sequences like U+23E9 U+FE0F?  Isn't it true that
    Eli> we use a color Emoji font for those because VS-16 is in emoji script?

Not anymore. Now we have a forward composition rule for U+23E9
U+FE0F that triggers because U+23E9 is in the emoji script, which is
why U+23E9 U+FE0E also uses the emoji font (currently).

For non-emoji codepoints like U+203C, adding U+FE0F uses the emoji
font because U+FE0F is in the emoji script (and thereʼs no composition
rule for U+203C, so the backwards looking one for U+FE0F is used).

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 14:28:02 GMT) Full text and rfc822 format available.

Message #165 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 16:27:28 +0200
>>>>> On Mon, 05 Jun 2023 16:47:22 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 05 Jun 2023 15:36:52 +0200
    >> 
    >> >>>>> On Mon, 05 Jun 2023 16:12:20 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    Eli> This is simple enough to install on emacs-29, I think?
    >> 
    >> The main change is in font.c, and looks like this. I think itʼs too
    >> big for emacs-29 (breaking composition is very easy, itʼs entirely
    >> possible Iʼve missed a few cases :-) )

    Eli> Hmm... I though just changing the fontset in fontset.el would be
    Eli> enough.

Itʼs almost enough to do that, and to check if the triggering
character is U+FE0E, bu then we fall foul of the composition rule
forward/backward issue again.

If we could have forward and backwards looking rules working together,
then font_range would get passed U+FE0F or U+FE0E as the triggering
character, it could choose the font, and there would be no need to
peek at the next character.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 15:36:01 GMT) Full text and rfc822 format available.

Message #168 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 18:35:37 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 05 Jun 2023 16:27:28 +0200
> 
>     Eli> Hmm... I though just changing the fontset in fontset.el would be
>     Eli> enough.
> 
> Itʼs almost enough to do that, and to check if the triggering
> character is U+FE0E, bu then we fall foul of the composition rule
> forward/backward issue again.

Which forward rules would conflict with a backward rule triggered by
U+FE0E?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 15:58:02 GMT) Full text and rfc822 format available.

Message #171 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 17:57:04 +0200
>>>>> On Mon, 05 Jun 2023 18:35:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 05 Jun 2023 16:27:28 +0200
    >> 
    Eli> Hmm... I though just changing the fontset in fontset.el would be
    Eli> enough.
    >> 
    >> Itʼs almost enough to do that, and to check if the triggering
    >> character is U+FE0E, bu then we fall foul of the composition rule
    >> forward/backward issue again.

    Eli> Which forward rules would conflict with a backward rule triggered by
    Eli> U+FE0E?

All the ones for the non-emoji codepoints that still need to be
composed as emoji sometimes, eg U+261D:

"\N{U+261D}"
"\N{U+261D}\N{U+1F3FB}"
"\N{U+261D}\N{U+1F3FC}"
"\N{U+261D}\N{U+1F3FD}"
"\N{U+261D}\N{U+1F3FE}"
"\N{U+261D}\N{U+1F3FF}"

to which we add:

"\N{U+261D}\N{U+FE0E}"
"\N{U+261D}\N{U+FE0F}"

(and not adding those doesnʼt help).

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 16:21:01 GMT) Full text and rfc822 format available.

Message #174 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 18:20:08 +0200
>>>>> On Mon, 05 Jun 2023 17:57:04 +0200, Robert Pluim <rpluim <at> gmail.com> said:

>>>>> On Mon, 05 Jun 2023 18:35:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >>> From: Robert Pluim <rpluim <at> gmail.com>
    >>> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >>> Date: Mon, 05 Jun 2023 16:27:28 +0200
    >>> 
    Eli> Hmm... I though just changing the fontset in fontset.el would be
    Eli> enough.
    >>> 
    >>> Itʼs almost enough to do that, and to check if the triggering
    >>> character is U+FE0E, bu then we fall foul of the composition rule
    >>> forward/backward issue again.

    Eli> Which forward rules would conflict with a backward rule triggered by
    Eli> U+FE0E?

    Robert> All the ones for the non-emoji codepoints that still need to be
    Robert> composed as emoji sometimes, eg U+261D:

Oh, and all the <foo>+skin tone ones. And probably more.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 16:40:02 GMT) Full text and rfc822 format available.

Message #177 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 19:39:37 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 05 Jun 2023 17:57:04 +0200
> 
> >>>>> On Mon, 05 Jun 2023 18:35:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     Eli> Which forward rules would conflict with a backward rule triggered by
>     Eli> U+FE0E?
> 
> All the ones for the non-emoji codepoints that still need to be
> composed as emoji sometimes, eg U+261D:
> 
> "\N{U+261D}"
> "\N{U+261D}\N{U+1F3FB}"
> "\N{U+261D}\N{U+1F3FC}"
> "\N{U+261D}\N{U+1F3FD}"
> "\N{U+261D}\N{U+1F3FE}"
> "\N{U+261D}\N{U+1F3FF}"

Couldn't we put these in the slots of #x1F3FB..#x1F3FF instead, as
backward rules?  As long as we don't have a forward rule starting with
#x261D, we could have backward rules for it triggered by #x1F3Fx and
#xFE0x, right?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Mon, 05 Jun 2023 16:43:02 GMT) Full text and rfc822 format available.

Message #180 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Mon, 05 Jun 2023 19:41:55 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Mon, 05 Jun 2023 18:20:08 +0200
> 
>     Eli> Which forward rules would conflict with a backward rule triggered by
>     Eli> U+FE0E?
> 
>     Robert> All the ones for the non-emoji codepoints that still need to be
>     Robert> composed as emoji sometimes, eg U+261D:
> 
> Oh, and all the <foo>+skin tone ones. And probably more.

What do you mean by <foo>+skin?  Can you give a few examples?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 06 Jun 2023 07:25:01 GMT) Full text and rfc822 format available.

Message #183 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 06 Jun 2023 09:24:03 +0200
>>>>> On Mon, 05 Jun 2023 19:41:55 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 05 Jun 2023 18:20:08 +0200
    >> 
    Eli> Which forward rules would conflict with a backward rule triggered by
    Eli> U+FE0E?
    >> 
    Robert> All the ones for the non-emoji codepoints that still need to be
    Robert> composed as emoji sometimes, eg U+261D:
    >> 
    >> Oh, and all the <foo>+skin tone ones. And probably more.

    Eli> What do you mean by <foo>+skin?  Can you give a few examples?

Anything using 1F3FB..1F3FF, such as 1F44B 1F3FB or 1F3C4 1F3FB

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 06 Jun 2023 07:29:01 GMT) Full text and rfc822 format available.

Message #186 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 06 Jun 2023 09:28:04 +0200
>>>>> On Mon, 05 Jun 2023 19:39:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
    >> Date: Mon, 05 Jun 2023 17:57:04 +0200
    >> 
    >> >>>>> On Mon, 05 Jun 2023 18:35:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
    >> 
    Eli> Which forward rules would conflict with a backward rule triggered by
    Eli> U+FE0E?
    >> 
    >> All the ones for the non-emoji codepoints that still need to be
    >> composed as emoji sometimes, eg U+261D:
    >> 
    >> "\N{U+261D}"
    >> "\N{U+261D}\N{U+1F3FB}"
    >> "\N{U+261D}\N{U+1F3FC}"
    >> "\N{U+261D}\N{U+1F3FD}"
    >> "\N{U+261D}\N{U+1F3FE}"
    >> "\N{U+261D}\N{U+1F3FF}"

    Eli> Couldn't we put these in the slots of #x1F3FB..#x1F3FF instead, as
    Eli> backward rules?  As long as we don't have a forward rule starting with
    Eli> #x261D, we could have backward rules for it triggered by #x1F3Fx and
    Eli> #xFE0x, right?

Yes, we could invert the whole composition rules setup, and make them
all work backwards, but then it will almost certainly all break again
with the next release of Unicode. Adding a special case for FE0E in
font_range is going to be more robust.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63731; Package emacs. (Tue, 06 Jun 2023 11:54:02 GMT) Full text and rfc822 format available.

Message #189 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Tue, 06 Jun 2023 14:53:41 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> Date: Tue, 06 Jun 2023 09:28:04 +0200
> 
> >>>>> On Mon, 05 Jun 2023 19:39:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
> 
>     >> From: Robert Pluim <rpluim <at> gmail.com>
>     >> Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
>     >> Date: Mon, 05 Jun 2023 17:57:04 +0200
>     >> 
>     >> >>>>> On Mon, 05 Jun 2023 18:35:37 +0300, Eli Zaretskii <eliz <at> gnu.org> said:
>     >> 
>     Eli> Which forward rules would conflict with a backward rule triggered by
>     Eli> U+FE0E?
>     >> 
>     >> All the ones for the non-emoji codepoints that still need to be
>     >> composed as emoji sometimes, eg U+261D:
>     >> 
>     >> "\N{U+261D}"
>     >> "\N{U+261D}\N{U+1F3FB}"
>     >> "\N{U+261D}\N{U+1F3FC}"
>     >> "\N{U+261D}\N{U+1F3FD}"
>     >> "\N{U+261D}\N{U+1F3FE}"
>     >> "\N{U+261D}\N{U+1F3FF}"
> 
>     Eli> Couldn't we put these in the slots of #x1F3FB..#x1F3FF instead, as
>     Eli> backward rules?  As long as we don't have a forward rule starting with
>     Eli> #x261D, we could have backward rules for it triggered by #x1F3Fx and
>     Eli> #xFE0x, right?
> 
> Yes, we could invert the whole composition rules setup, and make them
> all work backwards, but then it will almost certainly all break again
> with the next release of Unicode. Adding a special case for FE0E in
> font_range is going to be more robust.

I don't think it could break, since such sequences are all likely to
be triggered by special codepoints that follow the U+2xxx characters.
Our win would be a much simpler setup.

But okay, let's try to do it this way.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 05 Jul 2023 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 267 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.