GNU bug report logs - #49066
26.3; Segmentation fault on specific utf8 string

Previous Next

Package: emacs;

Reported by: "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com>

Date: Wed, 16 Jun 2021 21:08:02 UTC

Severity: normal

Tags: patch

Found in version 26.3

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 49066 in the body.
You can then email your comments to 49066 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Wed, 16 Jun 2021 21:08:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 16 Jun 2021 21:08:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.3; Segmentation fault on specific utf8 string
Date: Wed, 16 Jun 2021 18:07:06 -0300
Dear Emacs developers

I was editting a "comma-separated values" csv file for a geographic
map creation, tried simple edition commands that now I see that wer
irrelevant to bug reprodution. I managed to isolate the problem.

It seams that my version of emacs with gui is unable to display a
specific UTF8 line of a file possibly with mixing of text LTR and RTL
and crashes.

To help debug, I read /usr/share/emacs/26.3/etc/DEBUG, downloaded
Emacs sources from 2 places, builded to see if I can reproduce that.

I tried these versions:

* from Ubuntu package
  GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.13)
of 2019-12-24 -> emacs -Q foo -> always crash (I did it more tahn 20
times)
  same emacs, no gui -> emacs -nw -Q foo -> no crash

* git GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu) of 2021-06-16
without toolkits and images --> no crash
(1h30 of compilation time discoraged me to try to recompile)

* 26.3 compiled from source download from http://ftpmirror.gnu.org/emacs/
 - without toolkits -> no crash
 - with gtk3 -> no crash

So I got stuck with my usual emacs without debug symbols and gtk ...

How to reproduce:

1) Since just displaying the line crashes my Emacs I like to avoid
display it below. So please download the 641 bytes file "foo" from

wget https://sites.icmc.usp.br/frasson/foo

Its content is just 1 line of UTF8 text with the name of Saint Pierre
and Miquelon Islands in several languages.

You can obtain it also decoding the following base64 output with "base64 -d":

UTM0NjE3LNiz2KfZhiDYqNmK2YrYsSDZiNmF2YrZg9mE2YjZhizgprjgpr7gpoEg4Kaq4Ka/4Kav
4Ka84KeH4KawIOCmkyDgpq7gpr/gppXigIzgprLgp4vgpoEsU2FpbnQtUGllcnJlIHVuZCBNaXF1
ZWxvbixTYWludCBQaWVycmUgYW5kIE1pcXVlbG9uLFNhbiBQZWRybyB5IE1pcXVlbMOzbixTYWlu
dC1QaWVycmUtZXQtTWlxdWVsb24szqPOsc65zr0gzqDOuc61z4EgzrrOsc65IM6czrnOus61zrvP
jM69LOCkuOCkvuCkgS3gpKrgpY3gpK/gpYfgpLAg4KSU4KSwIOCkruClgOCkleClh+CksuCli+Ck
gixTYWludC1QaWVycmUgw6lzIE1pcXVlbG9uLFNhaW50IFBpZXJyZSBkYW4gTWlxdWVsb24sU2Fp
bnQtUGllcnJlIGUgTWlxdWVsb24s44K144Oz44OU44Ko44O844Or5bO244O744Of44Kv44Ot44Oz
5bO2LOyDne2UvOyXkOultCDrr7jtgbTrobEsU2FpbnQtUGllcnJlIGVuIE1pcXVlbG9uLFNhaW50
LVBpZXJyZSBpIE1pcXVlbG9uLFNhaW50LVBpZXJyZSBlIE1pcXVlbG9uLNCh0LXQvS3Qn9GM0LXR
gCDQuCDQnNC40LrQtdC70L7QvSxTYWludC1QaWVycmUgb2NoIE1pcXVlbG9uLFNhaW50IFBpZXJy
ZSB2ZSBNaXF1ZWxvbixTYWludC1QaWVycmUgdsOgIE1pcXVlbG9uLOWco+earuWfg+WwlOWSjOWv
huWFi+mahue+pOWymwo=

2) emacs -nw -Q foo

Ok, exit Emacs, no crash.

3) emacs -Q foo

Emacs crashes :-X

4) I see that with "emacs -nw -Q foo", if I delete the initial Q (or
maybe a character that resembles Q), text direction changes abruptly,
display/navigation gets crasy, just navigating with left and right
arrow keys, we jump from first line to last, some up and down keys
jumps a lot.  This happens even with trunk git emacs that I compiled.

If you like to see this, I recorded a screencast (2.63Mb):
wget https://sites.icmc.usp.br/frasson/emacs-navigation.mp4

From command line I get the following output:

Fatal error 11: Segmentation fault
Backtrace:
emacs[0x51ab42]
emacs[0x500211]
emacs[0x518f14]
emacs[0x51914d]
emacs[0x5191cd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f7fca29b3c0]
emacs[0x5ebe9b]
emacs[0x5ef70d]
emacs[0x58a752]
emacs[0x57913c]
emacs[0x5b8174]
emacs[0x57bb61]
emacs[0x5790bb]
emacs[0x5783fa]
emacs[0x4369ac]
emacs[0x443276]
emacs[0x5d9aa8]
emacs[0x5ddbe0]
emacs[0x44f664]
emacs[0x44d695]
emacs[0x4556f8]
emacs[0x45a843]
emacs[0x46f0c3]
emacs[0x472183]
emacs[0x57829e]
emacs[0x43a016]
emacs[0x45e079]
emacs[0x50a447]
emacs[0x50dad0]
emacs[0x50f1e4]
emacs[0x578206]
emacs[0x5005d4]
emacs[0x578175]
emacs[0x500573]
emacs[0x5057b7]
emacs[0x505b18]
emacs[0x4206d2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f7fc9f870b3]
emacs[0x4213de]
Falha de segmentação

Best regards

Miguel


In GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.13)
 of 2019-12-24 built on lcy01-amd64-029
Windowing system distributor 'The X.Org Foundation', version 11.0.12009000
System Description:    Ubuntu 20.04.2 LTS

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
saida-raw50.csv has auto save data; consider M-x recover-this-file
Mark set
Type y, n, ! or SPC (the space bar):
Defining kbd macro...
Mark set [2 times]
Replaced 169 occurrences
Keyboard macro defined

Configured using:
 'configure --build=x86_64-linux-gnu --prefix=/usr
 '--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
 '--infodir=${prefix}/share/info' --sysconfdir=/etc --localstatedir=/var
 --disable-silent-rules '--libdir=${prefix}/lib/x86_64-linux-gnu'
 '--libexecdir=${prefix}/lib/x86_64-linux-gnu' --disable-maintainer-mode
 --disable-dependency-tracking --prefix=/usr --sharedstatedir=/var/lib
 --program-suffix=26 --with-modules --with-file-notification=inotify
 --with-mailutils --with-x=yes --with-x-toolkit=gtk3 --with-xwidgets
 --with-lcms2 'CFLAGS=-g -O2
 -fdebug-prefix-map=/build/emacs26-XQGPla/emacs26-26.3~1.git96dd019=.
-fstack-protector-strong
 -Wformat -Werror=format-security -no-pie' 'CPPFLAGS=-Wdate-time
 -D_FORTIFY_SOURCE=2' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro
 -no-pie''

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS GLIB
NOTIFY LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS XWIDGETS
LIBSYSTEMD LCMS2

Important settings:
  value of $LANG: pt_BR.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822 mml
mml-sec password-cache epa derived epg epg-config gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils macros misearch multi-isearch kmacro
cl-extra help-mode easymenu cl-loaddefs cl-lib novice elec-pair
time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
xwidget-internal move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 99690 8444)
 (symbols 48 20739 1)
 (miscs 40 284 240)
 (strings 32 29677 1323)
 (string-bytes 1 787981)
 (vectors 16 15049)
 (vector-slots 8 550898 10514)
 (floats 8 51 224)
 (intervals 56 261 0)
 (buffers 992 13))


-- 
Miguel Vinicius Santini Frasson
mvsfrasson <at> gmail.com




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Wed, 16 Jun 2021 21:13:02 GMT) Full text and rfc822 format available.

Message #8 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com>
Cc: 49066 <at> debbugs.gnu.org
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Wed, 16 Jun 2021 23:12:44 +0200
"Miguel V. S. Frasson" <mvsfrasson <at> gmail.com> writes:

> * git GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu) of 2021-06-16
> without toolkits and images --> no crash
> (1h30 of compilation time discoraged me to try to recompile)

I can reproduce the crash in Emacs 26.1, but not in Emacs 27.1, so I
guess this has been fixed in later versions of Emacs?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Wed, 16 Jun 2021 21:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com>
To: 49066 <at> debbugs.gnu.org
Subject: file foo
Date: Wed, 16 Jun 2021 18:22:35 -0300
[Message part 1 (text/plain, inline)]
-- 
Miguel Vinicius Santini Frasson
mvsfrasson <at> gmail.com
[foo (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Thu, 17 Jun 2021 06:44:02 GMT) Full text and rfc822 format available.

Message #14 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 49066 <at> debbugs.gnu.org, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 09:43:40 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Wed, 16 Jun 2021 23:12:44 +0200
> Cc: 49066 <at> debbugs.gnu.org
> 
> "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com> writes:
> 
> > * git GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu) of 2021-06-16
> > without toolkits and images --> no crash
> > (1h30 of compilation time discoraged me to try to recompile)
> 
> I can reproduce the crash in Emacs 26.1, but not in Emacs 27.1, so I
> guess this has been fixed in later versions of Emacs?

I cannot reproduce at all, neither in Emacs 26 nor in all subsequent
versions.

Lars, can you show a backtrace from the crash?  Perhaps if I see that,
I could tell if it's a known (and fixed) problem.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Thu, 17 Jun 2021 07:44:02 GMT) Full text and rfc822 format available.

Message #17 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 09:43:03 +0200
>>>>> On Thu, 17 Jun 2021 09:43:40 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> I can reproduce the crash in Emacs 26.1, but not in Emacs 27.1, so I
    >> guess this has been fixed in later versions of Emacs?

    Eli> I cannot reproduce at all, neither in Emacs 26 nor in all subsequent
    Eli> versions.

    Eli> Lars, can you show a backtrace from the crash?  Perhaps if I see that,
    Eli> I could tell if it's a known (and fixed) problem.

    Eli> Thanks.

This is from an optimized build of emacs-26.1. I can redo it with a
'-g3 -O0' if you want.

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
    at ftfont.c:2573
2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) bt
#0  ftfont_shape_by_fltPython Exception <class 'gdb.error'> value has been optimized out: 
 (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=)
    at ftfont.c:2573
#1  ftfont_shapePython Exception <class 'gdb.error'> value has been optimized out: 
 (lgstring=, lgstring <at> entry=XIL(0xaa2755)) at ftfont.c:2615
#2  0x00000000005d97f5 in xftfont_shape (lgstring=XIL(0xaa2755)) at xftfont.c:670
#3  0x000000000057fc2a in Ffont_shape_gstringPython Exception <class 'gdb.error'> value has been optimized out: 
 (gstring=) at font.c:4427
#4  0x000000000056fede in funcall_subr (subr=0x97fac0 <Sfont_shape_gstring>, numargs=numargs <at> entry=1, args=args <at> entry=0x7fffffff59a0)
    at eval.c:2844
#5  0x000000000056ecff in Ffuncall (nargs=<optimized out>, args=args <at> entry=0x7fffffff5998) at lisp.h:600


Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Thu, 17 Jun 2021 08:14:01 GMT) Full text and rfc822 format available.

Message #20 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 49066 <at> debbugs.gnu.org, larsi <at> gnus.org, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 11:13:17 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,  49066 <at> debbugs.gnu.org,
>   mvsfrasson <at> gmail.com
> Date: Thu, 17 Jun 2021 09:43:03 +0200
> 
> This is from an optimized build of emacs-26.1. I can redo it with a
> '-g3 -O0' if you want.

That'd help.

> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
>     at ftfont.c:2573
> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));

So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
some way?  IOW, what is the immediate reason for the segfault?

> (gdb) bt
> #0  ftfont_shape_by_fltPython Exception <class 'gdb.error'> value has been optimized out: 

What's the story with these Python exceptions?  Looks like some
problem in our .gdbinit?

>  (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=)
>     at ftfont.c:2573
> #1  ftfont_shapePython Exception <class 'gdb.error'> value has been optimized out: 
>  (lgstring=, lgstring <at> entry=XIL(0xaa2755)) at ftfont.c:2615
> #2  0x00000000005d97f5 in xftfont_shape (lgstring=XIL(0xaa2755)) at xftfont.c:670
> #3  0x000000000057fc2a in Ffont_shape_gstringPython Exception <class 'gdb.error'> value has been optimized out: 
>  (gstring=) at font.c:4427
> #4  0x000000000056fede in funcall_subr (subr=0x97fac0 <Sfont_shape_gstring>, numargs=numargs <at> entry=1, args=args <at> entry=0x7fffffff59a0)
>     at eval.c:2844
> #5  0x000000000056ecff in Ffuncall (nargs=<optimized out>, args=args <at> entry=0x7fffffff5998) at lisp.h:600

The backtrace stops too soon.  Can you show more?  I'd like at the
very least to see which sequence of characters causes the trouble.
From the above, I can only glean that we were performing a character
composition.

It could be some problem with the shaping engine: I guess versions
after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
use m17n-flt in a later Emacs, does it still not crash?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Thu, 17 Jun 2021 13:08:01 GMT) Full text and rfc822 format available.

Message #23 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, larsi <at> gnus.org, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 15:07:18 +0200
>>>>> On Thu, 17 Jun 2021 11:13:17 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Robert Pluim <rpluim <at> gmail.com>
    >> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,  49066 <at> debbugs.gnu.org,
    >> mvsfrasson <at> gmail.com
    >> Date: Thu, 17 Jun 2021 09:43:03 +0200
    >> 
    >> This is from an optimized build of emacs-26.1. I can redo it with a
    >> '-g3 -O0' if you want.

    Eli> That'd help.

Full backtrace from an unoptimized build:

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x0000000000557a9d in AREF (array=XIL(0), idx=1) at lisp.h:1614
1614	  return XVECTOR (array)->contents[idx];
(gdb) bt
#0  0x0000000000557a9d in AREF (array=XIL(0), idx=1) at lisp.h:1614
#1  0x0000000000693602 in ftfont_shape_by_flt
    (lgstring=XIL(0xb64755), font=0x1308cb0 <bss_sbrk_buffer+8590480>, ft_face=0x340fef0, otf=0x342c810, matrix=0x1308da8 <bss_sbrk_buffer+8590728>) at ftfont.c:2573
#2  0x00000000006939c4 in ftfont_shape (lgstring=XIL(0xb64755)) at ftfont.c:2615
#3  0x0000000000695ae8 in xftfont_shape (lgstring=XIL(0xb64755)) at xftfont.c:670
#4  0x0000000000624f14 in Ffont_shape_gstring (gstring=XIL(0xb64755)) at font.c:4427
#5  0x000000000060714d in funcall_subr (subr=0xa41d60 <Sfont_shape_gstring>, numargs=1, args=0x7fffffff6830) at eval.c:2844
#6  0x0000000000606d80 in Ffuncall (nargs=2, args=0x7fffffff6828) at eval.c:2769
#7  0x000000000064ef3a in exec_byte_code
    (bytestr=XIL(0x81e114), vector=XIL(0x81e135), maxdepth=make_number(6), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:629
#8  0x0000000000607b03 in funcall_lambda (fun=XIL(0x81e0a5), nargs=5, arg_vector=0x81e135 <pure+964437>) at eval.c:3052
#9  0x0000000000606dc4 in Ffuncall (nargs=6, args=0x7fffffff6d20) at eval.c:2771
#10 0x000000000060392c in internal_condition_case_n (bfun=0x606c02 <Ffuncall>, nargs=6, args=0x7fffffff6d20, handlers=XIL(0xc090), hfun=
    0x43f2a4 <safe_eval_handler>) at eval.c:1412
#11 0x000000000043f519 in safe__call (inhibit_quit=false, nargs=6, func=XIL(0x8e6520), ap=0x7fffffff6e00) at xdisp.c:2617
#12 0x000000000043f60c in safe_call (nargs=6, func=XIL(0x8e6520)) at xdisp.c:2633
#13 0x000000000067e4e6 in autocmp_chars
    (rule=XIL(0xf2b705), charpos=40, bytepos=78, limit=42, win=0x103bc30 <bss_sbrk_buffer+5653520>, face=0x349d570, string=XIL(0))
    at composite.c:928
#14 0x000000000067fad8 in composition_reseat_it
    (cmp_it=0x7fffffff8f30, charpos=40, bytepos=78, endpos=464, w=0x103bc30 <bss_sbrk_buffer+5653520>, face=0x349d570, string=XIL(0))
    at composite.c:1228
#15 0x000000000044e88f in next_element_from_buffer (it=0x7fffffff86b0) at xdisp.c:8483
#16 0x000000000044ab2a in get_next_display_element (it=0x7fffffff86b0) at xdisp.c:7026
#17 0x00000000004715db in display_line (it=0x7fffffff86b0, cursor_vpos=3) at xdisp.c:21409
#18 0x0000000000466d36 in try_window (window=XIL(0x103bc35), pos=..., flags=1) at xdisp.c:17627
#19 0x00000000004648da in redisplay_window (window=XIL(0x103bc35), just_this_one_p=false) at xdisp.c:17074
#20 0x000000000045de89 in redisplay_window_0 (window=XIL(0x103bc35)) at xdisp.c:14831
#21 0x00000000006037bc in internal_condition_case_1
    (bfun=0x45de47 <redisplay_window_0>, arg=XIL(0x103bc35), handlers=XIL(0xb3de33), hfun=0x45de0f <redisplay_window_error>) at eval.c:1356
#22 0x000000000045dde4 in redisplay_windows (window=XIL(0x103bc35)) at xdisp.c:14811
#23 0x000000000045cd16 in redisplay_internal () at xdisp.c:14300
#24 0x000000000045ada7 in redisplay () at xdisp.c:13518
#25 0x0000000000563326 in read_char (commandflag=1, map=XIL(0x142c4b3), prev_event=XIL(0), used_mouse_menu=0x7fffffffdaef, end_time=0x0)
    at keyboard.c:2480
#26 0x000000000057056f in read_key_sequence
    (keybuf=0x7fffffffdc40, bufsize=30, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9147
#27 0x00000000005607c3 in command_loop_1 () at keyboard.c:1368
#28 0x0000000000603715 in internal_condition_case (bfun=0x5603b5 <command_loop_1>, handlers=XIL(0x5250), hfun=0x55fb97 <cmd_error>)
    at eval.c:1332
#29 0x00000000005600a6 in command_loop_2 (ignore=XIL(0)) at keyboard.c:1110
#30 0x0000000000602fed in internal_catch (tag=XIL(0xc6f0), func=0x560079 <command_loop_2>, arg=XIL(0)) at eval.c:1097
#31 0x0000000000560045 in command_loop () at keyboard.c:1089
#32 0x000000000055f76a in recursive_edit_1 () at keyboard.c:695
#33 0x000000000055f8ea in Frecursive_edit () at keyboard.c:766
#34 0x000000000055d58e in main (argc=2, argv=0x7fffffffe128) at emacs.c:1713

Lisp Backtrace:
"font-shape-gstring" (0xffff6830)
"auto-compose-chars" (0xffff6d28)
"redisplay_internal (C function)" (0x0)
(gdb) 

    >> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
    >> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
    >> at ftfont.c:2573
    >> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));

    Eli> So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
    Eli> some way?  IOW, what is the immediate reason for the
    Eli> segfault?

Itʼs lgstring, I think this is one of those 'nil's in lgstring

0  0x0000000000557a9d in AREF (array=XIL(0), idx=1) at lisp.h:1614
1614	  return XVECTOR (array)->contents[idx];
(gdb) up
#1  0x0000000000693602 in ftfont_shape_by_flt (lgstring=XIL(0xb64755), font=0x1308cb0 <bss_sbrk_buffer+8590480>, ft_face=0x340fef0, 
    otf=0x342c810, matrix=0x1308da8 <bss_sbrk_buffer+8590728>) at ftfont.c:2573
2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) pp lgstring
[[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]
(gdb) p g
$2 = (MFLTGlyphFT *) 0x2e631e0
(gdb) p *g
$3 = {
  g = {
    c = 2453,
    code = 20,
    from = 0,
    to = 2,
    xadv = 1024,
    yadv = 0,
    ascent = 768,
    descent = 0,
    lbearing = -64,
    rbearing = 1024,
    xoff = 0,
    yoff = 0,
    encoded = 1,
    measured = 1,
    adjusted = 0,
    internal = 0
  },
  libotf_positioning_type = 0
}

    >> (gdb) bt
    >> #0  ftfont_shape_by_fltPython Exception <class 'gdb.error'> value has been optimized out: 

    Eli> What's the story with these Python exceptions?  Looks like some
    Eli> problem in our .gdbinit?

They donʼt happen with an unoptimized build.

    Eli> The backtrace stops too soon.  Can you show more?  I'd like at the
    Eli> very least to see which sequence of characters causes the trouble.
    Eli> From the above, I can only glean that we were performing a character
    Eli> composition.

This is enough to cause the crash: ক‌

Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?

    Eli> It could be some problem with the shaping engine: I guess versions
    Eli> after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
    Eli> use m17n-flt in a later Emacs, does it still not crash?

emacs-27 built '--without-harfbuzz' and thus with m17n-flt crashes the same way.

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Thu, 17 Jun 2021 14:00:02 GMT) Full text and rfc822 format available.

Message #26 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>, Kenichi Handa <handa <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, larsi <at> gnus.org, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 16:59:42 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: larsi <at> gnus.org,  49066 <at> debbugs.gnu.org,  mvsfrasson <at> gmail.com
> Date: Thu, 17 Jun 2021 15:07:18 +0200
> 
> Full backtrace from an unoptimized build:

Thanks.

>     >> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
>     >> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
>     >> at ftfont.c:2573
>     >> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
> 
>     Eli> So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
>     Eli> some way?  IOW, what is the immediate reason for the
>     Eli> segfault?
> 
> Itʼs lgstring, I think this is one of those 'nil's in lgstring

Yes, I think so.  We can verify that by looking at the value of
g->g.to:

  (gdb) p *g
  $3 = {
    g = {
      c = 2453,
      code = 20,
      from = 0,
      to = 2, <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

And the LGLYPH whose index is 2 is indeed nil:

  (gdb) pp lgstring
  [[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]  ^^^

I think this is a bug in that loop: it should actually exit whenever
it finds the first LGLYPH that is nil, and update gstring.used
accordingly.  Something like this:

  for (i = 0; i < gstring.used; i++)
    {
      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;

      if (NILP (LGSTRING_GLYPH (lgstring, g->g.from))
          || NILP (LGSTRING_GLYPH (lgstring, g->g.to)))
	break;
      g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
    }
  gstring.used = i;

CC'ing Handa-san, as I'm not really familiar with this code.

> This is enough to cause the crash: ক‌
> 
> Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?

Because #x995 is a Bengali character, and lisp/language/indian.el
says:

  (defconst bengali-composable-pattern
    (let ((table
	   '(("a" . "\u0981")		; SIGN CANDRABINDU
	     ("A" . "[\u0982\u0983]")	; SIGN ANUSVARA .. VISARGA
	     ("V" . "[\u0985-\u0994\u09E0\u09E1]") ; independent vowel
	     ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
	     ("B" . "[\u09AC\u09AF\u09B0\u09F0]")		; BA, YA, RA
	     ("R" . "[\u09B0\u09F0]")		; RA
	     ("n" . "\u09BC")		; NUKTA
	     ("v" . "[\u09BE-\u09CC\u09D7\u09E2\u09E3]") ; vowel sign
	     ("H" . "\u09CD")		; HALANT
	     ("T" . "\u09CE")		; KHANDA TA
	     ("N" . "\u200C")		; ZWNJ  <<<<<<<<<<<<<<<<<<<<<<<<<<<
	     ("J" . "\u200D")		; ZWJ
	     ("X" . "[\u0980-\u09FF]"))))	; all coverage
      (indian-compose-regexp
       (concat
	;; syllables with an independent vowel, or
	"\\(?:RH\\)?Vn?\\(?:J?HB\\)?v*n?a?A?\\|"
	;; consonant-based syllables, or
	"Cn?\\(?:J?HJ?Cn?\\)*\\(?:H[NJ]?\\|v*[NJ]?v?a?A?\\)\\|"
	;; another syllables with an independent vowel, or
	"\\(?:RH\\)?T\\|"
	;; special consonant form, or
	"JHB\\|"
	;; any other singleton characters
	"X")
       table))
    "Regexp matching a composable sequence of Bengali characters.")

(which is used below that in setting up composition-function-table for
Bengali characters).

>     Eli> It could be some problem with the shaping engine: I guess versions
>     Eli> after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
>     Eli> use m17n-flt in a later Emacs, does it still not crash?
> 
> emacs-27 built '--without-harfbuzz' and thus with m17n-flt crashes the same way.

Yes, it figures.

I hope Handa-san will suggest a solution, for those who want to stick
with m17n-flt.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Thu, 17 Jun 2021 15:05:01 GMT) Full text and rfc822 format available.

Message #29 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, rpluim <at> gmail.com, larsi <at> gnus.org,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 18:04:26 +0300
> Date: Thu, 17 Jun 2021 16:59:42 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 49066 <at> debbugs.gnu.org, larsi <at> gnus.org, mvsfrasson <at> gmail.com
> 
> > This is enough to cause the crash: ক‌
> > 
> > Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?
> 
> Because #x995 is a Bengali character, and lisp/language/indian.el
> says:

Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
compose unless they are followed by a character.  See section 12.2 in
the Unicode Standard.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Sun, 27 Jun 2021 02:30:02 GMT) Full text and rfc822 format available.

Message #32 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, rpluim <at> gmail.com, eggert <at> cs.ucla.edu, larsi <at> gnus.org,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sun, 27 Jun 2021 11:29:28 +0900
Hi,

>   (gdb) pp lgstring
>   [[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]  ^^^

> I think this is a bug in that loop: it should actually exit whenever
> it finds the first LGLYPH that is nil, and update gstring.used
> accordingly.  Something like this:

>   for (i = 0; i < gstring.used; i++)
>     {
>       MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;

>       if (NILP (LGSTRING_GLYPH (lgstring, g->g.from))
>           || NILP (LGSTRING_GLYPH (lgstring, g->g.to)))
> 	break;
>       g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
>       g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
>     }
>   gstring.used = i;

I don't think so because glyphs of indices g->g.from and g->g.to should
not be nil.

> > This is enough to cause the crash: ক‌

As I surely remember that rendering that string with m17n-flt had no
problem before, I suspect that some change after I wrote the code has a
problem.

So, I tried to restore the old code as the attached patch, and then the
patched emacs has no problem of rendering the above Bengali string.

The patch cancels this change: 
------------------------------------------------------------
commit 04ac097f34d887e1ae8dea1e884118728e931c7a
Author: Paul Eggert <eggert <at> cs.ucla.edu>
Date:   Fri Nov 13 12:02:21 2015 -0800

    Spruce up ftfont.c memory allocation
    
    * src/ftfont.c (setup_otf_gstring):
    Avoid O(N**2) behavior when reallocating.
    (ftfont_shape_by_flt): Prefer xpalloc to xrealloc when
    reallocating buffers; this simplifies the code.  Do not trust
    mflt_run to leave the output areas unchanged on failure, as
    this isn’t part of its interface spec.
------------------------------------------------------------

But, at the moment I don't know why the new code does not work.

---
K. Handa
handa <at> gnu.org

diff --git a/src/ftfont.c b/src/ftfont.c
index 0603dd9ce6..26198928d8 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -2720,6 +2720,37 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 	}
     }
 
+#define RESTORE_OLD_CODE
+#ifdef RESTORE_OLD_CODE
+  if (gstring.allocated == 0)
+    {
+      gstring.glyph_size = sizeof (MFLTGlyph);
+      gstring.glyphs = xnmalloc (len * 2, sizeof *gstring.glyphs);
+      gstring.allocated = len * 2;
+    }
+  else if (gstring.allocated < len * 2)
+    {
+      gstring.glyphs = xnrealloc (gstring.glyphs, len * 2,
+				  sizeof *gstring.glyphs);
+      gstring.allocated = len * 2;
+    }
+  memset (gstring.glyphs, 0, len * sizeof *gstring.glyphs);
+  for (i = 0; i < len; i++)
+    {
+      Lisp_Object g = LGSTRING_GLYPH (lgstring, i);
+
+      gstring.glyphs[i].c = LGLYPH_CHAR (g);
+      if (with_variation_selector)
+	{
+	  gstring.glyphs[i].code = LGLYPH_CODE (g);
+	  gstring.glyphs[i].encoded = 1;
+	}
+    }
+
+  gstring.used = len;
+  gstring.r2l = 0;
+#endif
+
   {
     Lisp_Object family = Ffont_get (LGSTRING_FONT (lgstring), QCfamily);
 
@@ -2763,6 +2794,20 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 	return make_fixnum (0);
     }
 
+#ifdef RESTORE_OLD_CODE
+  for (i = 0; i < 3; i++)
+    {
+      int result = mflt_run (&gstring, 0, len, &flt_font_ft.flt_font, flt);
+      if (result != -2)
+	break;
+      int len2;
+      if (INT_MULTIPLY_WRAPV (gstring.allocated, 2, &len2))
+	memory_full (SIZE_MAX);
+      gstring.glyphs = xnrealloc (gstring.glyphs,
+				  gstring.allocated, 2 * sizeof (MFLTGlyphFT));
+      gstring.allocated = len2;
+    }
+#else
   MFLTGlyphFT *glyphs = (MFLTGlyphFT *) gstring.glyphs;
   ptrdiff_t allocated = gstring.allocated;
   ptrdiff_t incr_min = len - allocated;
@@ -2795,6 +2840,7 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
       gstring.r2l = 0;
     }
   while (mflt_run (&gstring, 0, len, &flt_font_ft.flt_font, flt) == -2);
+#endif
 
   if (gstring.used > LGSTRING_GLYPH_LEN (lgstring))
     return Qnil;




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Sun, 27 Jun 2021 06:21:02 GMT) Full text and rfc822 format available.

Message #35 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: handa <handa <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, rpluim <at> gmail.com, eggert <at> cs.ucla.edu, larsi <at> gnus.org,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sun, 27 Jun 2021 09:20:41 +0300
> From: handa <handa <at> gnu.org>
> Cc: rpluim <at> gmail.com, larsi <at> gnus.org, 49066 <at> debbugs.gnu.org,
>  mvsfrasson <at> gmail.com, eggert <at> cs.ucla.edu
> Date: Sun, 27 Jun 2021 11:29:28 +0900
> 
> So, I tried to restore the old code as the attached patch, and then the
> patched emacs has no problem of rendering the above Bengali string.

Thanks.  Robert, Miguel: could you please try this patch and see if it
fixes the problem?

Since we are moving away of m17n-flt, I don't think we should optimize
memory management when m17n-flt is used, especially if that causes
problems.  So if the patch fixes the crash, I think we should install
it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Sun, 27 Jun 2021 18:03:01 GMT) Full text and rfc822 format available.

Message #38 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, handa <handa <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, rpluim <at> gmail.com, larsi <at> gnus.org,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sun, 27 Jun 2021 11:02:26 -0700
On 6/26/21 11:20 PM, Eli Zaretskii wrote:
> Since we are moving away of m17n-flt, I don't think we should optimize
> memory management when m17n-flt is used, especially if that causes
> problems.  So if the patch fixes the crash, I think we should install
> it.

Sure, and I can volunteer to do that. Would you like me to do it in 
master now, or wait for confirmation and install it on the emacs-27 
branch? or perhaps some other course of action?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Sun, 27 Jun 2021 19:17:02 GMT) Full text and rfc822 format available.

Message #41 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 49066 <at> debbugs.gnu.org, handa <at> gnu.org, rpluim <at> gmail.com, larsi <at> gnus.org,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sun, 27 Jun 2021 22:15:50 +0300
> Cc: rpluim <at> gmail.com, larsi <at> gnus.org, 49066 <at> debbugs.gnu.org,
>  mvsfrasson <at> gmail.com
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sun, 27 Jun 2021 11:02:26 -0700
> 
> On 6/26/21 11:20 PM, Eli Zaretskii wrote:
> > Since we are moving away of m17n-flt, I don't think we should optimize
> > memory management when m17n-flt is used, especially if that causes
> > problems.  So if the patch fixes the crash, I think we should install
> > it.
> 
> Sure, and I can volunteer to do that. Would you like me to do it in 
> master now, or wait for confirmation and install it on the emacs-27 
> branch? or perhaps some other course of action?

I'd like to see the confirmation, and then install this on master.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Mon, 28 Jun 2021 10:57:02 GMT) Full text and rfc822 format available.

Message #44 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, handa <at> gnu.org, larsi <at> gnus.org,
 Paul Eggert <eggert <at> cs.ucla.edu>, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Mon, 28 Jun 2021 12:56:06 +0200
>>>>> On Sun, 27 Jun 2021 22:15:50 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> Cc: rpluim <at> gmail.com, larsi <at> gnus.org, 49066 <at> debbugs.gnu.org,
    >> mvsfrasson <at> gmail.com
    >> From: Paul Eggert <eggert <at> cs.ucla.edu>
    >> Date: Sun, 27 Jun 2021 11:02:26 -0700
    >> 
    >> On 6/26/21 11:20 PM, Eli Zaretskii wrote:
    >> > Since we are moving away of m17n-flt, I don't think we should optimize
    >> > memory management when m17n-flt is used, especially if that causes
    >> > problems.  So if the patch fixes the crash, I think we should install
    >> > it.
    >> 
    >> Sure, and I can volunteer to do that. Would you like me to do it in 
    >> master now, or wait for confirmation and install it on the emacs-27 
    >> branch? or perhaps some other course of action?

    Eli> I'd like to see the confirmation, and then install this on master.

    Eli> Thanks.

With the patch it still crashes for me in emacs-master with harfbuzz disabled:

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x000055555576d4e7 in AREF (array=XIL(0), idx=1) at lisp.h:1838
1838	  return XVECTOR (array)->contents[idx];
(gdb) bt
#0  0x000055555576d4e7 in AREF (array=XIL(0), idx=1) at lisp.h:1838
#1  0x0000555555774be0 in ftfont_shape_by_flt
    (lgstring=XIL(0x7ffff1e5301d), font=0x55555604f410, ft_face=0x5555566a2400, otf=0x555556696b60, matrix=0x55555604f508) at ftfont.c:2852
#2  0x0000555555775002 in ftfont_shape (lgstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at ftfont.c:2890
#3  0x000055555577629e in ftcrfont_shape (lgstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at ftcrfont.c:477
#4  0x000055555571344c in Ffont_shape_gstring (gstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at font.c:4499
#5  0x00005555557019fb in Ffuncall (nargs=3, args=args <at> entry=0x7fffffffd670) at eval.c:3039
#6  0x000055555573cdf8 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:632
#7  0x0000555555701937 in Ffuncall (nargs=nargs <at> entry=7, args=args <at> entry=0x7fffffffd990) at eval.c:3055
#8  0x0000555555700cf9 in internal_condition_case_n (bfun=
    0x555555701760 <Ffuncall>, nargs=nargs <at> entry=7, args=args <at> entry=0x7fffffffd990, handlers=handlers <at> entry=XIL(0x30), hfun=hfun <at> entry=
    0x5555555ca5e0 <safe_eval_handler>) at eval.c:1642
#9  0x00005555555b8603 in safe__call
    (inhibit_quit=inhibit_quit <at> entry=false, nargs=nargs <at> entry=7, func=<optimized out>, ap=ap <at> entry=0x7fffffffda28) at lisp.h:1002
#10 0x00005555555c79b5 in safe_call (nargs=nargs <at> entry=7, func=<optimized out>) at xdisp.c:3009
#11 0x00005555557609c5 in autocmp_chars
    (rule=XIL(0x7ffff1e501bd), charpos=charpos <at> entry=146, bytepos=<optimized out>, limit=<optimized out>, 
    limit <at> entry=148, win=win <at> entry=0x555556030100, face=face <at> entry=0x0, string=XIL(0), direction=XIL(0)) at lisp.h:731
#12 0x000055555576426d in find_automatic_composition (pos=pos <at> entry=146, limit=146, 
    limit <at> entry=-1, backlim=backlim <at> entry=-1, start=start <at> entry=0x7fffffffdc68, end=end <at> entry=0x7fffffffdc70, gstring=gstring <at> entry=0x7fffffffdc78, string=XIL(0)) at composite.c:1661
#13 0x0000555555764f39 in composition_adjust_point (last_pt=last_pt <at> entry=146, new_pt=new_pt <at> entry=146) at lisp.h:1002
#14 0x00005555556960ff in command_loop_1 () at keyboard.c:1569
#15 0x00005555557009d7 in internal_condition_case
    (bfun=bfun <at> entry=0x555555695020 <command_loop_1>, handlers=handlers <at> entry=XIL(0x90), hfun=hfun <at> entry=0x55555568bac0 <cmd_error>)
    at eval.c:1478
#16 0x0000555555686064 in command_loop_2 (ignore=ignore <at> entry=XIL(0)) at lisp.h:1002
#17 0x0000555555702ed3 in internal_catch (tag=tag <at> entry=XIL(0xe520), func=func <at> entry=0x555555686040 <command_loop_2>, arg=arg <at> entry=XIL(0))
    at eval.c:1198
#18 0x000055555568600b in command_loop () at lisp.h:1002
#19 0x000055555568b6d6 in recursive_edit_1 () at keyboard.c:720
#20 0x000055555568ba02 in Frecursive_edit () at keyboard.c:789
#21 0x00005555555a177f in main (argc=2, argv=<optimized out>) at emacs.c:2308

Lisp Backtrace:
"font-shape-gstring" (0xffffd678)
"auto-compose-chars" (0xffffd998)
(gdb) up
#1  0x0000555555774be0 in ftfont_shape_by_flt (lgstring=XIL(0x7ffff1e5301d), font=0x55555604f410, ft_face=0x5555566a2400, 
    otf=0x555556696b60, matrix=0x55555604f508) at ftfont.c:2852
2852	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) up
#2  0x0000555555775002 in ftfont_shape (lgstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at ftfont.c:2890
2890	  return ftfont_shape_by_flt (lgstring, font, ftfont_info->ft_size->face, otf,
(gdb) pp lgstring
[[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 16 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil nil nil nil]
(gdb) down
#1  0x0000555555774be0 in ftfont_shape_by_flt (lgstring=XIL(0x7ffff1e5301d), font=0x55555604f410, ft_face=0x5555566a2400, 
    otf=0x555556696b60, matrix=0x55555604f508) at ftfont.c:2852
2852	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) p *g
$1 = {
  g = {
    c = 2453,
    code = 0,
    from = 0,
    to = 2,
    xadv = 704,
    yadv = 0,
    ascent = 896,
    descent = 0,
    lbearing = 64,
    rbearing = 640,
    xoff = 0,
    yoff = 0,
    encoded = 1,
    measured = 1,
    adjusted = 0,
    internal = 1073741823
  },
  libotf_positioning_type = 8204
}

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Mon, 28 Jun 2021 12:06:02 GMT) Full text and rfc822 format available.

Message #47 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 49066 <at> debbugs.gnu.org, handa <at> gnu.org, larsi <at> gnus.org, eggert <at> cs.ucla.edu,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Mon, 28 Jun 2021 15:05:33 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>,  handa <at> gnu.org,  larsi <at> gnus.org,
>   49066 <at> debbugs.gnu.org,  mvsfrasson <at> gmail.com
> Date: Mon, 28 Jun 2021 12:56:06 +0200
> 
>     Eli> I'd like to see the confirmation, and then install this on master.
> 
>     Eli> Thanks.
> 
> With the patch it still crashes for me in emacs-master with harfbuzz disabled:

Too bad.

Kenichi, any suggestions?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Sat, 03 Jul 2021 02:06:01 GMT) Full text and rfc822 format available.

Message #50 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, rpluim <at> gmail.com, eggert <at> cs.ucla.edu, larsi <at> gnus.org,
 mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sat, 03 Jul 2021 11:05:05 +0900
[Message part 1 (text/plain, inline)]
In article <83bl7qp52q.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> > With the patch it still crashes for me in emacs-master with harfbuzz disabled:

> Too bad.
> Kenichi, any suggestions?

I checked the code again, and found that it was a fault of m17n-lib
which was not robust enough to handle an OTF table that is different
from what the library expects.

Here is a revised patch to handle such a case.  Could you please try it?

------------------------------------------------------------
diff --git a/src/ftfont.c b/src/ftfont.c
index 0603dd9ce6..12d0d72d27 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -2798,10 +2798,31 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 
   if (gstring.used > LGSTRING_GLYPH_LEN (lgstring))
     return Qnil;
+
+  /* mflt_run may fail to set g->g.to (which must be a valid index
+     into lgstring) correctly if the font has an OTF table that is
+     different from what the m17n library expects. */
   for (i = 0; i < gstring.used; i++)
     {
       MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+      if (g->g.to >= len)
+	{
+	  /* Invalid g->g.to. */
+	  g->g.to = len - 1;
+	  int from = g->g.from;
+	  /* Fix remaining glyphs. */
+	  for (++i; i < gstring.used; i++)
+	    {
+	      g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+	      g->g.from = from;
+	      g->g.to = len - 1;
+	    }
+	}
+    }
 
+  for (i = 0; i < gstring.used; i++)
+    {
+      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
       g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
       g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
     }
------------------------------------------------------------

> Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
> compose unless they are followed by a character.  See section 12.2 in
> the Unicode Standard.

Even if they should not be composed with, we must include them in the
string to shape because their existence may change the glyph of the
previous character.  A shaper (m17n-lib or harfbuzz) must return a glyph
string that has an independent grapheme cluster for the last ZWJ/ZWNJ.

At the time of developing m17n-lib, the above rule was not clear.  To
conform to that rule, please to put the attached BNG2-OTF.flt under the
directory ~/.m17n.d/.

---
K. Handa
handa <at> gnu.org

[BNG2-OTF.flt (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Mon, 05 Jul 2021 09:29:02 GMT) Full text and rfc822 format available.

Message #53 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: handa <handa <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu,
 larsi <at> gnus.org, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Mon, 05 Jul 2021 11:28:43 +0200
>>>>> On Sat, 03 Jul 2021 11:05:05 +0900, handa <handa <at> gnu.org> said:

    handa> In article <83bl7qp52q.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
    >> > With the patch it still crashes for me in emacs-master with harfbuzz disabled:

    >> Too bad.
    >> Kenichi, any suggestions?

    handa> I checked the code again, and found that it was a fault of m17n-lib
    handa> which was not robust enough to handle an OTF table that is different
    handa> from what the library expects.

    handa> Here is a revised patch to handle such a case.  Could you please try it?

Thanks, that fixes the crash, and results in the ZWNJ being composed.

    >> Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
    >> compose unless they are followed by a character.  See section 12.2 in
    >> the Unicode Standard.

    handa> Even if they should not be composed with, we must include them in the
    handa> string to shape because their existence may change the glyph of the
    handa> previous character.  A shaper (m17n-lib or harfbuzz) must return a glyph
    handa> string that has an independent grapheme cluster for the last ZWJ/ZWNJ.

    handa> At the time of developing m17n-lib, the above rule was not clear.  To
    handa> conform to that rule, please to put the attached BNG2-OTF.flt under the
    handa> directory ~/.m17n.d/.

I believe you, but I did not test this specifically.

Robert
-- 




Added tag(s) patch. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Mon, 05 Jul 2021 13:28:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#49066; Package emacs. (Tue, 20 Jul 2021 12:24:02 GMT) Full text and rfc822 format available.

Message #58 received at 49066 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 49066 <at> debbugs.gnu.org, handa <handa <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 eggert <at> cs.ucla.edu, mvsfrasson <at> gmail.com
Subject: Re: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Tue, 20 Jul 2021 14:23:40 +0200
Robert Pluim <rpluim <at> gmail.com> writes:

>     handa> Here is a revised patch to handle such a case.  Could you
>     handa> please try it?
>
> Thanks, that fixes the crash, and results in the ZWNJ being composed.

I see that the patch wasn't applied, so I pushed it now to Emacs 28.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 28.1, send any further explanations to 49066 <at> debbugs.gnu.org and "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Tue, 20 Jul 2021 12:24:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 18 Aug 2021 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 244 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.