GNU bug report logs - #36507
27.0.50; Crash on evaluating invalid UTF-8 byte sequence on MacOS

Previous Next

Package: emacs;

Reported by: Stefan Kangas <stefan <at> marxist.se>

Date: Fri, 5 Jul 2019 02:05:02 UTC

Severity: normal

Found in version 27.0.50

Done: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36507 in the body.
You can then email your comments to 36507 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#36507; Package emacs. (Fri, 05 Jul 2019 02:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Kangas <stefan <at> marxist.se>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 05 Jul 2019 02:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; Crash on evaluating invalid UTF-8 byte sequence on MacOS
Date: Fri, 5 Jul 2019 04:04:21 +0200
When evaluating the following expression, I get a crash under "emacs -Q"
compiled from current master.

(decode-coding-string "\xE3\x32\x9A\x36" 'chinese-gb18030)

This expression is tested in batch mode with no problems on the same
system, now on master in test/lisp/bookmark-tests.el:281.

The expression was suggested in Bug#36452, where

Eli Zaretskii <eliz <at> gnu.org> writes:
> Please add to that text something that doesn't yield valid
> UTF-8 byte sequence.  For example, these two strings:
>
>   (decode-coding-string "\xE3\x32\x9A\x36" 'chinese-gb18030)

I think the issue as such is beyond me, but I can reproduce this every time.
Please let me know if you need help testing or more information.

Before crash, I get this output:
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007fff8ddbd326 in CFCharacterSetIsLongCharacterMember () from
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation

Here is the stack trace, and report-emacs-bug info below:

(gdb) bt
#0  0x00007fff8ddbd326 in CFCharacterSetIsLongCharacterMember () from
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
#1  0x0000000100437f31 in macfont_has_char (font=XIL(0x101937625),
c=2246732) at macfont.m:2727
#2  0x000000010030bd9d in font_has_char (f=0x102849430,
font=XIL(0x101937625), c=2246732) at font.c:3002
#3  0x00000001003d62c5 in fontset_find_font (fontset=XIL(0x10482b515),
c=2246732, face=0x101163460, charset_id=-1, fallback=true) at
fontset.c:676
#4  0x00000001003ccc19 in fontset_font (fontset=XIL(0x10189f835),
c=2246732, face=0x101163460, id=-1) at fontset.c:799
#5  0x00000001003cc48c in face_for_char (f=0x102849430,
face=0x101163460, c=2246732, pos=4, object=XIL(0)) at fontset.c:989
#6  0x00000001001a1fd3 in FACE_FOR_CHAR (f=0x102849430,
face=0x101163460, character=2246732, pos=4, object=XIL(0)) at
./dispextern.h:1846
#7  0x0000000100040f2d in get_next_display_element (it=0x7fff5fbfd980)
at xdisp.c:7447
#8  0x000000010004396b in move_it_in_display_line_to
(it=0x7fff5fbfd980, to_charpos=42, to_x=-1, op=MOVE_TO_POS) at
xdisp.c:8933
#9  0x000000010003f618 in move_it_to (it=0x7fff5fbfd980,
to_charpos=42, to_x=-1, to_y=-1, to_vpos=-1, op=8) at xdisp.c:9683
#10 0x000000010005f2df in resize_mini_window (w=0x101836220,
exact_p=true) at xdisp.c:11447
#11 0x000000010005be65 in resize_mini_window_1 (a1=4320354848,
exactly=XIL(0xb970)) at xdisp.c:11364
#12 0x000000010005bce9 in with_echo_area_buffer (w=0x101836220,
which=0, fn=0x10005be10 <resize_mini_window_1>, a1=4320354848,
a2=XIL(0xb970)) at xdisp.c:11086
#13 0x000000010005b7cc in resize_echo_area_exactly () at xdisp.c:11342
#14 0x00000001001ac700 in command_loop_1 () at keyboard.c:1484
#15 0x00000001002d43af in internal_condition_case (bfun=0x1001ab850
<command_loop_1>, handlers=XIL(0x4c50), hfun=0x1001ca1a0 <cmd_error>)
at eval.c:1352
#16 0x00000001001ca081 in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
#17 0x00000001002d3508 in internal_catch (tag=XIL(0xbfd0),
func=0x1001ca050 <command_loop_2>, arg=XIL(0)) at eval.c:1113
#18 0x00000001001aab25 in command_loop () at keyboard.c:1070
#19 0x00000001001aa927 in recursive_edit_1 () at keyboard.c:714
#20 0x00000001001aad76 in Frecursive_edit () at keyboard.c:786
#21 0x00000001001a7e27 in main (argc=2, argv=0x7fff5fbffad8) at emacs.c:2103
[New Thread 0x20db of process 22966]
[New Thread 0x2203 of process 22966]
[New Thread 0x145b of process 22966]


(gdb) xbacktrace
(gdb)


In GNU Emacs 27.0.50 (build 1, x86_64-apple-darwin15.6.0, NS
appkit-1404.47 Version 10.11.6 (Build 15G22010))
 of 2019-07-05 built on Stefans-MBP
Repository revision: 44f199648b0c986a0ac7608f4e9d803c619ae2d6
Repository branch: master
Windowing system distributor 'Apple', version 10.3.1404
System Description:  Mac OS X 10.11.6

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --without-makeinfo --enable-checking=yes,glyphs
 --enable-check-lisp-object-type 'CFLAGS=-O0 -g3''

Configured features:
NOTIFY KQUEUE ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS THREADS
PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: en_SE <at> calendar=iso8601.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs text-property-search time-date
seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
elec-pair tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/ns-win ns-win ucs-normalize mule-util
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads kqueue cocoa ns lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 44089 5693)
 (symbols 48 5808 1)
 (strings 32 15104 1574)
 (string-bytes 1 497022)
 (vectors 16 9842)
 (vector-slots 8 115136 11088)
 (floats 8 17 25)
 (intervals 56 183 0)
 (buffers 992 11))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36507; Package emacs. (Fri, 05 Jul 2019 02:23:02 GMT) Full text and rfc822 format available.

Message #8 received at 36507 <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 36507 <at> debbugs.gnu.org
Subject: Re: bug#36507: 27.0.50;
 Crash on evaluating invalid UTF-8 byte sequence on MacOS
Date: Fri, 05 Jul 2019 11:22:45 +0900
On Fri, 05 Jul 2019 11:04:21 +0900,
Stefan Kangas wrote:
> 
> When evaluating the following expression, I get a crash under "emacs -Q"
> compiled from current master.
> 
> (decode-coding-string "\xE3\x32\x9A\x36" 'chinese-gb18030)
> 
> This expression is tested in batch mode with no problems on the same
> system, now on master in test/lisp/bookmark-tests.el:281.
> 
> The expression was suggested in Bug#36452, where
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> > Please add to that text something that doesn't yield valid
> > UTF-8 byte sequence.  For example, these two strings:
> >
> >   (decode-coding-string "\xE3\x32\x9A\x36" 'chinese-gb18030)
> 
> I think the issue as such is beyond me, but I can reproduce this every time.
> Please let me know if you need help testing or more information.
> 
> Before crash, I get this output:
> Thread 1 received signal SIGSEGV, Segmentation fault.
> 0x00007fff8ddbd326 in CFCharacterSetIsLongCharacterMember () from
> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation

Please try the patch below.

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp

diff --git a/src/macfont.m b/src/macfont.m
index f736fbf0e1e..2b7f963fd61 100644
--- a/src/macfont.m
+++ b/src/macfont.m
@@ -2076,7 +2076,7 @@ static int macfont_variation_glyphs (struct font *, int c,
               ptrdiff_t j;
 
               for (j = 0; j < ASIZE (chars); j++)
-                if (TYPE_RANGED_FIXNUMP (UTF32Char, AREF (chars, j))
+                if (RANGED_FIXNUMP (0, AREF (chars, j), MAX_UNICODE_CHAR)
                     && CFCharacterSetIsLongCharacterMember (desc_charset,
                                                             XFIXNAT (AREF (chars, j))))
                   break;
@@ -2710,6 +2710,9 @@ So we use CTFontDescriptorCreateMatchingFontDescriptor (no
   int result;
   CFCharacterSetRef charset;
 
+  if (c < 0 || c > MAX_UNICODE_CHAR)
+    return false;
+
   block_input ();
   if (FONT_ENTITY_P (font))
     {




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36507; Package emacs. (Fri, 05 Jul 2019 11:37:01 GMT) Full text and rfc822 format available.

Message #11 received at 36507 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Cc: 36507 <at> debbugs.gnu.org
Subject: Re: bug#36507: 27.0.50; Crash on evaluating invalid UTF-8 byte
 sequence on MacOS
Date: Fri, 5 Jul 2019 13:36:34 +0200
YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp> writes:
> > >   (decode-coding-string "\xE3\x32\x9A\x36" 'chinese-gb18030)
> >
> > I think the issue as such is beyond me, but I can reproduce this every time.
> > Please let me know if you need help testing or more information.
> >
> > Before crash, I get this output:
> > Thread 1 received signal SIGSEGV, Segmentation fault.
> > 0x00007fff8ddbd326 in CFCharacterSetIsLongCharacterMember () from
> > /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
>
> Please try the patch below.

The patch works; I no longer get the crash.  The return value is now:

    "#(" " 0 1 (charset gb18030-4-byte-ext-2))"

Note that the " " is a visually wide white space character that I
can't copy to other programs for some reason.  It is here replaced
with a space.  Not sure if this is expected or not.

Thank you for providing a fix so swiftly.

Best regards,
Stefan Kangas




Reply sent to YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>:
You have taken responsibility. (Sat, 06 Jul 2019 05:27:02 GMT) Full text and rfc822 format available.

Notification sent to Stefan Kangas <stefan <at> marxist.se>:
bug acknowledged by developer. (Sat, 06 Jul 2019 05:27:02 GMT) Full text and rfc822 format available.

Message #16 received at 36507-done <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 36507-done <at> debbugs.gnu.org
Subject: Re: bug#36507: 27.0.50;
 Crash on evaluating invalid UTF-8 byte sequence on MacOS
Date: Sat, 06 Jul 2019 14:26:27 +0900
On Fri, 05 Jul 2019 20:36:34 +0900,
Stefan Kangas wrote:
> 
> YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp> writes:
> > > >   (decode-coding-string "\xE3\x32\x9A\x36" 'chinese-gb18030)
> > >
> > > I think the issue as such is beyond me, but I can reproduce this every time.
> > > Please let me know if you need help testing or more information.
> > >
> > > Before crash, I get this output:
> > > Thread 1 received signal SIGSEGV, Segmentation fault.
> > > 0x00007fff8ddbd326 in CFCharacterSetIsLongCharacterMember () from
> > > /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
> >
> > Please try the patch below.
> 
> The patch works; I no longer get the crash.  The return value is now:
> 
>     "#(" " 0 1 (charset gb18030-4-byte-ext-2))"

Thanks.  I pushed the patch to master and the emacs-26 branch as
0e15bd11dc0 and f0db687a285, respectively.  (I forgot to add the bug
ID to commit log for the former.)  Closing the bug.

> Note that the " " is a visually wide white space character that I
> can't copy to other programs for some reason.  It is here replaced
> with a space.  Not sure if this is expected or not.

On the Mac port, from which macfont.m originally came, the character
is displayed with boxed hexadecimal.  So, this would be another issue
specific to the NS port.

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 03 Aug 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 266 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.