GNU bug report logs - #52067
29.0.50; string-glyph-split halts on certain emoji strings

Previous Next

Package: emacs;

Reported by: PAVLOS MARAGAKIS <paul.maragakis <at> icloud.com>

Date: Tue, 23 Nov 2021 23:02:01 UTC

Severity: normal

Found in version 29.0.50

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 52067 in the body.
You can then email your comments to 52067 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#52067; Package emacs. (Tue, 23 Nov 2021 23:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to PAVLOS MARAGAKIS <paul.maragakis <at> icloud.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 23 Nov 2021 23:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: PAVLOS MARAGAKIS <paul.maragakis <at> icloud.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; string-glyph-split halts on certain emoji strings
Date: Tue, 23 Nov 2021 18:01:12 -0500
In a clean instance of emacs paste the following lines
into the scratch buffer and evaluate the lines starting
with string-glyph-split.  I show the outputs below:

(string-glyph-split "🌍đŸŠč")
("🌍" "đŸŠč")

(string-glyph-split "✈✈")
("✈" "✈")

(string-glyph-split "đŸŒâœˆïž")
("🌍" "✈")

(string-glyph-split "âœˆïžđŸŒ")

The last line will halt emacs; C-g can stop the evaluation.
The expected behavior was to split the string in two glyphs.




In GNU Emacs 29.0.50 (build 2, aarch64-apple-darwin21.1.0, NS appkit-2113.00 Version 12.0.1 (Build 21A559))
of 2021-11-21 built on MacbookPro13.local
Repository revision: b7db7eb2c7b8ac1bddf4afa9ccf9b30ebeb0224e
Repository branch: master
Windowing system distributor 'Apple', version 10.3.2113
System Description:  macOS 12.0.1

Configured using:
'configure --disable-silent-rules
--enable-locallisppath=/usr/local/share/emacs/28.0.50/site-lisp
--prefix=/usr/local/opt/gccemacs --without-dbus --without-imagemagick
--with-mailutils --with-ns --disable-ns-self-contained --with-cairo
--with-modules --with-xml2 --with-gnutls --with-json --with-rsvg
--with-native-compilation CC=/usr/bin/clang
CFLAGS=-I/opt/homebrew/lib/gcc/11/include
'LDFLAGS=-L/opt/homebrew/lib/gcc/11/
-I/opt/homebrew/lib/gcc/11/include'
CPPFLAGS=-I/opt/homebrew/opt/libffi/include
'PKG_CONFIG_PATH=/opt/homebrew/opt/libffi/lib/pkgconfig --no-create
--no-recursion''

Configured features:
ACL GLIB GNUTLS JSON LCMS2 LIBXML2 MODULES NATIVE_COMP NOTIFY KQUEUE NS
PDUMPER PNG RSVG THREADS TOOLKIT_SCROLL_BARS WEBP XIM ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media rmc puny
dired dired-loaddefs rfc822 mml mml-sec epa derived epg rfc6068
epg-config gnus-util rmail rmail-loaddefs auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map
text-property-search seq gv byte-opt bytecomp byte-compile cconv
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils time-date subr-x help-fns radix-tree cl-print debug backtrace
find-func help-mode cl-loaddefs cl-lib iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/ns-win ns-win ucs-normalize mule-util term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget keymap hashtable-print-readable backquote threads
kqueue cocoa ns lcms2 multi-tty make-network-process native-compile
emacs)

Memory information:
((conses 16 76126 7924)
(symbols 48 7053 0)
(strings 32 21398 1914)
(string-bytes 1 725508)
(vectors 16 15268)
(vector-slots 8 317989 13284)
(floats 8 26 61)
(intervals 56 341 0)
(buffers 992 13))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52067; Package emacs. (Wed, 24 Nov 2021 03:52:02 GMT) Full text and rfc822 format available.

Message #8 received at 52067 <at> debbugs.gnu.org (full text, mbox):

From: Paul Maragakis <paul.maragakis <at> icloud.com>
To: 52067 <at> debbugs.gnu.org
Subject: Re: bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on
 certain emoji strings)
Date: Tue, 23 Nov 2021 22:51:07 -0500
The logic in string-glyph-split expects the first two elements in the result
from find-composition-internal to give the start and end of a multibyte grapheme
and return nil when there is a regular character at position POS.  However, this 
isn't always the case.

Let's call x the argument POS in find-composition-internal, 
and "interval" the first two elements of the return value.

The following example works as expected, i.e. x of 0, or 1 returns the interval (0 2), 
and x of 2, or 3 returns (2 4).

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "✈✈" nil))) '(0 1 2 3 4))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (2 4
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3
  (2 4
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (4 nil))
nil


In the following case, however, x of 2 returns interval (0 2).

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "âœˆïžđŸŒ" nil))) '(0 1 2 3))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3 nil))
nil


Interestingly, in the following case, an x of 0, 1, 2, or 3 all return (0 2).

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "âœˆïžđŸŒđŸŒ" nil))) '(0 1 2 3 4))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (4 nil))
nil


And in the following case a POS of 3 returns (3 5)

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "âœˆïžđŸŒâœˆïž" nil))) '(0 1 2 3 4 5))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3
  (3 5
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (4
  (3 5
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (5 nil))
nil


> On Nov 23, 2021, at 6:02 PM, GNU bug Tracking System <help-debbugs <at> gnu.org> wrote:
> 
> Thank you for filing a new bug report with debbugs.gnu.org.
> 
> This is an automatically generated reply to let you know your message
> has been received.
> 
> Your message is being forwarded to the package maintainers and other
> interested parties for their attention; they will reply in due course.
> 
> Your message has been sent to the package maintainer(s):
> bug-gnu-emacs <at> gnu.org
> 
> If you wish to submit further information on this problem, please
> send it to 52067 <at> debbugs.gnu.org.
> 
> Please do not send mail to help-debbugs <at> gnu.org unless you wish
> to report a problem with the Bug-tracking system.
> 
> -- 
> 52067: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=52067
> GNU Bug Tracking System
> Contact help-debbugs <at> gnu.org with problems





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52067; Package emacs. (Wed, 24 Nov 2021 04:59:01 GMT) Full text and rfc822 format available.

Message #11 received at 52067 <at> debbugs.gnu.org (full text, mbox):

From: Paul Maragakis <paul.maragakis <at> icloud.com>
To: 52067 <at> debbugs.gnu.org
Subject: possible fix for string-glyph-split halts on certain emoji strings.
Date: Tue, 23 Nov 2021 23:58:17 -0500
The following code fixes this bug, though there might be better ways to fix it for someone who understands the domain.
I don't know much about glyph/grapheme representations, so although this code passes my limited tests, it may break other things.

(defun pm-string-glyph-split (string)
  "Split STRING into a list of strings representing separate glyphs.
This takes into account combining characters and grapheme clusters."
  (let ((result nil)
        (start 0)
	(laststart -1) ;; the last start of a character with the composition property
        comp)
    (while (< start (length string))
      (setq comp (find-composition-internal start nil string nil))
      (if (and comp (/= laststart (car comp)))  ;; check that we don't return to same start
          (progn
            (push (substring string (car comp) (cadr comp)) result)
	    (setq laststart start)  ;; keep the start of the last successful search.
            (setq start (cadr comp)))
        (push (substring string start (1+ start)) result)
        (setq start (1+ start))))
    (nreverse result)))


Compare to the original:

(defun string-glyph-split (string)
  "Split STRING into a list of strings representing separate glyphs.
This takes into account combining characters and grapheme clusters."
  (let ((result nil)
        (start 0)
        comp)
    (while (< start (length string))
      (if (setq comp (find-composition-internal start nil string nil))
          (progn
            (push (substring string (car comp) (cadr comp)) result)
            (setq start (cadr comp)))
        (push (substring string start (1+ start)) result)
        (setq start (1+ start))))
    (nreverse result)))






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52067; Package emacs. (Wed, 24 Nov 2021 07:32:02 GMT) Full text and rfc822 format available.

Message #14 received at 52067 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Paul Maragakis <paul.maragakis <at> icloud.com>
Cc: 52067 <at> debbugs.gnu.org
Subject: Re: bug#52067: 29.0.50; string-glyph-split halts on certain emoji
 strings
Date: Wed, 24 Nov 2021 08:30:55 +0100
Paul Maragakis <paul.maragakis <at> icloud.com> writes:

> The logic in string-glyph-split expects the first two elements in the result
> from find-composition-internal to give the start and end of a multibyte grapheme
> and return nil when there is a regular character at position POS.  However, this 
> isn't always the case.

Yup.  

Paul Maragakis <paul.maragakis <at> icloud.com> writes:

> The following code fixes this bug, though there might be better ways
> to fix it for someone who understands the domain.

Thanks.  `find-composition' takes a the LIMIT parameter, and that'll
make it avoid searching back into the bit of the string that we've
already handled.  So I did that instead in Emacs 29.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 29.1, send any further explanations to 52067 <at> debbugs.gnu.org and PAVLOS MARAGAKIS <paul.maragakis <at> icloud.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 24 Nov 2021 07:32:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52067; Package emacs. (Wed, 24 Nov 2021 15:16:02 GMT) Full text and rfc822 format available.

Message #19 received at 52067 <at> debbugs.gnu.org (full text, mbox):

From: Paul Maragakis <paul.maragakis <at> icloud.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 52067 <at> debbugs.gnu.org
Subject: Re: bug#52067: 29.0.50; string-glyph-split halts on certain emoji
 strings
Date: Wed, 24 Nov 2021 10:15:07 -0500
Excellent---and thanks for the explanation!  
I confirm that the latest Emacs 29 fixes the bug.  
You can close this ticket.

Paul

> On Nov 24, 2021, at 2:30 AM, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> 
> Paul Maragakis <paul.maragakis <at> icloud.com> writes:
> 
>> The logic in string-glyph-split expects the first two elements in the result
>> from find-composition-internal to give the start and end of a multibyte grapheme
>> and return nil when there is a regular character at position POS.  However, this 
>> isn't always the case.
> 
> Yup.  
> 
> Paul Maragakis <paul.maragakis <at> icloud.com> writes:
> 
>> The following code fixes this bug, though there might be better ways
>> to fix it for someone who understands the domain.
> 
> Thanks.  `find-composition' takes a the LIMIT parameter, and that'll
> make it avoid searching back into the bit of the string that we've
> already handled.  So I did that instead in Emacs 29.
> 
> -- 
> (domestic pets only, the antidote for overdose, milk.)
>   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52067; Package emacs. (Wed, 24 Nov 2021 16:16:02 GMT) Full text and rfc822 format available.

Message #22 received at 52067 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Paul Maragakis <paul.maragakis <at> icloud.com>
Cc: 52067 <at> debbugs.gnu.org
Subject: Re: bug#52067: 29.0.50; string-glyph-split halts on certain emoji
 strings
Date: Wed, 24 Nov 2021 17:14:54 +0100
Paul Maragakis <paul.maragakis <at> icloud.com> writes:

> Excellent---and thanks for the explanation!  
> I confirm that the latest Emacs 29 fixes the bug.  
> You can close this ticket.

Thanks for checking; closed now.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 23 Dec 2021 12:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 86 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.