GNU bug report logs - #50718
28.0.50; `split-string` fails on certain unicode strings

Previous Next

Package: emacs;

Reported by: dalanicolai <dalanicolai <at> gmail.com>

Date: Tue, 21 Sep 2021 09:29:02 UTC

Severity: normal

Tags: notabug

Found in version 28.0.50

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50718 in the body.
You can then email your comments to 50718 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#50718; Package emacs. (Tue, 21 Sep 2021 09:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to dalanicolai <dalanicolai <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 21 Sep 2021 09:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: dalanicolai <dalanicolai <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; `split-string` fails on certain unicode strings
Date: Tue, 21 Sep 2021 11:27:48 +0200
[Message part 1 (text/plain, inline)]
Evaluate: (split-string "१०.३" ".")
It wrongly returns a list with only empty string.
Of course it should return alist with the individual devanagari numbers.


In GNU Emacs 28.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version 3.24.30,
cairo version 1.17.4)
 of 2021-09-06 built on daniel-fedora
Repository revision: c4724add006e62b81f847937db56335a81bdcc74
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Fedora 34 (Workstation Edition)

Configured using:
 'configure --with-mailutils --with-cairo --with-modules --with-pgtk
 --with-native-compilation'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES
NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF
TOOLKIT_SCROLL_BARS X11 XDBE XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=none
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug comp comp-cstr warnings rx message rmc
puny dired dired-loaddefs rfc822 mml mml-sec epa derived epg rfc6068
epg-config gnus-util rmail rmail-loaddefs auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
time-date subr-x cl-extra shortdoc text-property-search seq byte-opt gv
bytecomp byte-compile cconv help-fns radix-tree help-mode cl-loaddefs
cl-lib iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process
native-compile emacs)

Memory information:
((conses 16 94870 10759)
 (symbols 48 7970 1)
 (strings 32 23722 1760)
 (string-bytes 1 872683)
 (vectors 16 16528)
 (vector-slots 8 305866 17210)
 (floats 8 71 35)
 (intervals 56 444 0)
 (buffers 992 14))
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50718; Package emacs. (Tue, 21 Sep 2021 09:45:01 GMT) Full text and rfc822 format available.

Message #8 received at 50718 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dalanicolai <dalanicolai <at> gmail.com>
Cc: 50718 <at> debbugs.gnu.org
Subject: Re: bug#50718: 28.0.50;
 `split-string` fails on certain unicode strings
Date: Tue, 21 Sep 2021 12:44:22 +0300
tags 50718 notabug
thanks

> From: dalanicolai <dalanicolai <at> gmail.com>
> Date: Tue, 21 Sep 2021 11:27:48 +0200
> 
> Evaluate: (split-string "१०.३" ".")
> It wrongly returns a list with only empty string.
> Of course it should return alist with the individual devanagari numbers.

That's a cockpit error: the SEPARATORS argument should be a regular
expression, so you should use "\\." instead.




Added tag(s) notabug. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 21 Sep 2021 09:45:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50718; Package emacs. (Tue, 21 Sep 2021 09:52:02 GMT) Full text and rfc822 format available.

Message #13 received at 50718 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: dalanicolai <dalanicolai <at> gmail.com>
Cc: 50718 <at> debbugs.gnu.org
Subject: Re: bug#50718: 28.0.50; `split-string` fails on certain unicode
 strings
Date: Tue, 21 Sep 2021 11:51:44 +0200
On Sep 21 2021, dalanicolai wrote:

> Evaluate: (split-string "१०.३" ".")
> It wrongly returns a list with only empty string.

You have specified all characters as separators, since "." matches any
character.  If you want to match only the period you need to use "\\."
has the regexp.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Reply sent to Stefan Kangas <stefan <at> marxist.se>:
You have taken responsibility. (Tue, 21 Sep 2021 15:30:05 GMT) Full text and rfc822 format available.

Notification sent to dalanicolai <dalanicolai <at> gmail.com>:
bug acknowledged by developer. (Tue, 21 Sep 2021 15:30:06 GMT) Full text and rfc822 format available.

Message #18 received at 50718-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 50718-done <at> debbugs.gnu.org, dalanicolai <dalanicolai <at> gmail.com>
Subject: Re: bug#50718: 28.0.50;
 `split-string` fails on certain unicode strings
Date: Tue, 21 Sep 2021 08:29:27 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

> tags 50718 notabug
> thanks
>
>> From: dalanicolai <dalanicolai <at> gmail.com>
>> Date: Tue, 21 Sep 2021 11:27:48 +0200
>>
>> Evaluate: (split-string "१०.३" ".")
>> It wrongly returns a list with only empty string.
>> Of course it should return alist with the individual devanagari numbers.
>
> That's a cockpit error: the SEPARATORS argument should be a regular
> expression, so you should use "\\." instead.

I'm therefore closing this bug report.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50718; Package emacs. (Wed, 22 Sep 2021 08:00:02 GMT) Full text and rfc822 format available.

Message #21 received at 50718 <at> debbugs.gnu.org (full text, mbox):

From: dalanicolai <dalanicolai <at> gmail.com>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 50718 <at> debbugs.gnu.org
Subject: Re: bug#50718: 28.0.50;
 `split-string` fails on certain unicode strings
Date: Wed, 22 Sep 2021 09:59:22 +0200
[Message part 1 (text/plain, inline)]
Haha, okay that is a some unexperienced (or not fully awake) mistake.
Anyway, will not forget about that again, I guess. Thanks for the reply!

On Tue, 21 Sept 2021 at 11:51, Andreas Schwab <schwab <at> linux-m68k.org> wrote:

> On Sep 21 2021, dalanicolai wrote:
>
> > Evaluate: (split-string "१०.३" ".")
> > It wrongly returns a list with only empty string.
>
> You have specified all characters as separators, since "." matches any
> character.  If you want to match only the period you need to use "\\."
> has the regexp.
>
> Andreas.
>
> --
> Andreas Schwab, schwab <at> linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 20 Oct 2021 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 160 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.