GNU bug report logs - #29871
25.3; ZWJ word-boundaries in regexps

Previous Next

Package: emacs;

Reported by: "Mark Shoulson" <mark <at> nagas.meson.org>

Date: Wed, 27 Dec 2017 20:18:01 UTC

Severity: minor

Tags: notabug

Found in version 25.3

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29871 in the body.
You can then email your comments to 29871 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#29871; Package emacs. (Wed, 27 Dec 2017 20:18:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Mark Shoulson" <mark <at> nagas.meson.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 27 Dec 2017 20:18:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Mark Shoulson" <mark <at> nagas.meson.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 25.3; ZWJ word-boundaries in regexps
Date: Wed, 27 Dec 2017 14:07:40 -0500
According to http://unicode.org/reports/tr29/#Word_Boundaries rule WB4,
it would seem that a ZWJ character (U+200D ZERO WIDTH JOINER) between
two "word" characters should not constitute a word boundary.  And yet:

(string-match "\\<" "foo\u200Dfbar" 1)

evaluates to 4 (the 1 is to skip the word-beginning at the start of the
string).  Or you can search for "\\b" or "\\>" and get 3.  Either way,
indicative of a word-break at the ZWJ character.  Is this correct?

~mark


In GNU Emacs 25.3.1 (x86_64-redhat-linux-gnu, GTK+ Version 3.22.19)
 of 2017-09-14 built on buildvm-29.phx2.fedoraproject.org
Configured using:
 'configure --build=x86_64-redhat-linux-gnu
 --host=x86_64-redhat-linux-gnu --program-prefix=
 --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
 --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
 --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
 --libexecdir=/usr/libexec --localstatedir=/var
 --sharedstatedir=/var/lib --mandir=/usr/share/man
 --infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-png
 --with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3
 --with-gpm=no --with-xwidgets --with-modules
 build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu
 'CFLAGS=-DMAIL_USE_LOCKF -O2 -g -pipe -Wall -Werror=format-security
 -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
 --param=ssp-buffer-size=4 -grecord-gcc-switches
 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic'
 LDFLAGS=-Wl,-z,relro
 PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GCONF GSETTINGS NOTIFY
ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES XWIDGETS

Important settings:
  value of $LANG: en_US.utf8
  value of $XMODIFIERS: @im=none
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
../../usr/share/emacs/site-lisp/uim-el/uim-key.el: (lambda (x) ...) quoted with ' rather than with #' [3 times]
../../usr/share/emacs/site-lisp/uim-el/uim-preedit.el: (lambda (x) ...) quoted with ' rather than with #'
../../usr/share/emacs/site-lisp/uim-el/uim-candidate.el: (lambda (x) ...) quoted with ' rather than with #' [5 times]
../../usr/share/emacs/site-lisp/uim-el/uim-helper.el: (lambda (x) ...) quoted with ' rather than with #' [2 times]
../../usr/share/emacs/site-lisp/uim-el/uim.el: (lambda (x) ...) quoted with ' rather than with #' [9 times]
../../usr/share/emacs/site-lisp/uim-el/uim-leim.el: (lambda (x) ...) quoted with ' rather than with #'
uim.el: starting uim-el-helper-agent... done
uim.el: starting uim-el-agent... done
Loading /usr/share/emacs/site-lisp/site-start.d/uim-init.el (source)...done
For information about GNU Emacs and the GNU system, type C-_ C-a.

Load-path shadows:
~/lib/yaml-mode hides /home/mark/.emacs.d/elpa/yaml-mode-20170727.1531/yaml-mode
/usr/share/emacs/site-lisp/site-start.d/maxima-modes hides /usr/share/emacs/site-lisp/maxima/site_start.d/maxima-modes
~/lib/mwheel hides /usr/share/emacs/25.3/lisp/mwheel
~/lib/css-mode hides /usr/share/emacs/25.3/lisp/textmodes/css-mode
~/lib/cperl-mode hides /usr/share/emacs/25.3/lisp/progmodes/cperl-mode
/home/mark/.emacs.d/elpa/org-20170606/ob-table hides /usr/share/emacs/25.3/lisp/org/ob-table
/home/mark/.emacs.d/elpa/org-20170606/ob-sass hides /usr/share/emacs/25.3/lisp/org/ob-sass
/home/mark/.emacs.d/elpa/org-20170606/ob-lilypond hides /usr/share/emacs/25.3/lisp/org/ob-lilypond
/home/mark/.emacs.d/elpa/org-20170606/org-pcomplete hides /usr/share/emacs/25.3/lisp/org/org-pcomplete
/home/mark/.emacs.d/elpa/org-20170606/ox-man hides /usr/share/emacs/25.3/lisp/org/ox-man
/home/mark/.emacs.d/elpa/org-20170606/org-list hides /usr/share/emacs/25.3/lisp/org/org-list
/home/mark/.emacs.d/elpa/org-20170606/ob-core hides /usr/share/emacs/25.3/lisp/org/ob-core
/home/mark/.emacs.d/elpa/org-20170606/org-compat hides /usr/share/emacs/25.3/lisp/org/org-compat
/home/mark/.emacs.d/elpa/org-20170606/ob-dot hides /usr/share/emacs/25.3/lisp/org/ob-dot
/home/mark/.emacs.d/elpa/org-20170606/org-faces hides /usr/share/emacs/25.3/lisp/org/org-faces
/home/mark/.emacs.d/elpa/org-20170606/org-mouse hides /usr/share/emacs/25.3/lisp/org/org-mouse
/home/mark/.emacs.d/elpa/org-20170606/ob-makefile hides /usr/share/emacs/25.3/lisp/org/ob-makefile
/home/mark/.emacs.d/elpa/org-20170606/ob-perl hides /usr/share/emacs/25.3/lisp/org/ob-perl
/home/mark/.emacs.d/elpa/org-20170606/org-irc hides /usr/share/emacs/25.3/lisp/org/org-irc
/home/mark/.emacs.d/elpa/org-20170606/org-mobile hides /usr/share/emacs/25.3/lisp/org/org-mobile
/home/mark/.emacs.d/elpa/org-20170606/org-rmail hides /usr/share/emacs/25.3/lisp/org/org-rmail
/home/mark/.emacs.d/elpa/org-20170606/ob-asymptote hides /usr/share/emacs/25.3/lisp/org/ob-asymptote
/home/mark/.emacs.d/elpa/org-20170606/ob-matlab hides /usr/share/emacs/25.3/lisp/org/ob-matlab
/home/mark/.emacs.d/elpa/org-20170606/org-indent hides /usr/share/emacs/25.3/lisp/org/org-indent
/home/mark/.emacs.d/elpa/org-20170606/org hides /usr/share/emacs/25.3/lisp/org/org
/home/mark/.emacs.d/elpa/org-20170606/ob-haskell hides /usr/share/emacs/25.3/lisp/org/ob-haskell
/home/mark/.emacs.d/elpa/org-20170606/org-plot hides /usr/share/emacs/25.3/lisp/org/org-plot
/home/mark/.emacs.d/elpa/org-20170606/org-feed hides /usr/share/emacs/25.3/lisp/org/org-feed
/home/mark/.emacs.d/elpa/org-20170606/org-bibtex hides /usr/share/emacs/25.3/lisp/org/org-bibtex
/home/mark/.emacs.d/elpa/org-20170606/org-src hides /usr/share/emacs/25.3/lisp/org/org-src
/home/mark/.emacs.d/elpa/org-20170606/ob-awk hides /usr/share/emacs/25.3/lisp/org/ob-awk
/home/mark/.emacs.d/elpa/org-20170606/org-gnus hides /usr/share/emacs/25.3/lisp/org/org-gnus
/home/mark/.emacs.d/elpa/org-20170606/org-macs hides /usr/share/emacs/25.3/lisp/org/org-macs
/home/mark/.emacs.d/elpa/org-20170606/ob-octave hides /usr/share/emacs/25.3/lisp/org/ob-octave
/home/mark/.emacs.d/elpa/org-20170606/org-table hides /usr/share/emacs/25.3/lisp/org/org-table
/home/mark/.emacs.d/elpa/org-20170606/ob-scala hides /usr/share/emacs/25.3/lisp/org/ob-scala
/home/mark/.emacs.d/elpa/org-20170606/ox-org hides /usr/share/emacs/25.3/lisp/org/ox-org
/home/mark/.emacs.d/elpa/org-20170606/org-version hides /usr/share/emacs/25.3/lisp/org/org-version
/home/mark/.emacs.d/elpa/org-20170606/ox-beamer hides /usr/share/emacs/25.3/lisp/org/ox-beamer
/home/mark/.emacs.d/elpa/org-20170606/ob-C hides /usr/share/emacs/25.3/lisp/org/ob-C
/home/mark/.emacs.d/elpa/org-20170606/ob-ref hides /usr/share/emacs/25.3/lisp/org/ob-ref
/home/mark/.emacs.d/elpa/org-20170606/ox hides /usr/share/emacs/25.3/lisp/org/ox
/home/mark/.emacs.d/elpa/org-20170606/ox-ascii hides /usr/share/emacs/25.3/lisp/org/ox-ascii
/home/mark/.emacs.d/elpa/org-20170606/org-bbdb hides /usr/share/emacs/25.3/lisp/org/org-bbdb
/home/mark/.emacs.d/elpa/org-20170606/ob-java hides /usr/share/emacs/25.3/lisp/org/ob-java
/home/mark/.emacs.d/elpa/org-20170606/org-agenda hides /usr/share/emacs/25.3/lisp/org/org-agenda
/home/mark/.emacs.d/elpa/org-20170606/ob-mscgen hides /usr/share/emacs/25.3/lisp/org/ob-mscgen
/home/mark/.emacs.d/elpa/org-20170606/ob-org hides /usr/share/emacs/25.3/lisp/org/ob-org
/home/mark/.emacs.d/elpa/org-20170606/ob-js hides /usr/share/emacs/25.3/lisp/org/ob-js
/home/mark/.emacs.d/elpa/org-20170606/org-w3m hides /usr/share/emacs/25.3/lisp/org/org-w3m
/home/mark/.emacs.d/elpa/org-20170606/ob-comint hides /usr/share/emacs/25.3/lisp/org/ob-comint
/home/mark/.emacs.d/elpa/org-20170606/ob-sqlite hides /usr/share/emacs/25.3/lisp/org/ob-sqlite
/home/mark/.emacs.d/elpa/org-20170606/org-protocol hides /usr/share/emacs/25.3/lisp/org/org-protocol
/home/mark/.emacs.d/elpa/org-20170606/org-clock hides /usr/share/emacs/25.3/lisp/org/org-clock
/home/mark/.emacs.d/elpa/org-20170606/ob-picolisp hides /usr/share/emacs/25.3/lisp/org/ob-picolisp
/home/mark/.emacs.d/elpa/org-20170606/ob hides /usr/share/emacs/25.3/lisp/org/ob
/home/mark/.emacs.d/elpa/org-20170606/org-loaddefs hides /usr/share/emacs/25.3/lisp/org/org-loaddefs
/home/mark/.emacs.d/elpa/org-20170606/ob-calc hides /usr/share/emacs/25.3/lisp/org/ob-calc
/home/mark/.emacs.d/elpa/org-20170606/ob-lob hides /usr/share/emacs/25.3/lisp/org/ob-lob
/home/mark/.emacs.d/elpa/org-20170606/org-eshell hides /usr/share/emacs/25.3/lisp/org/org-eshell
/home/mark/.emacs.d/elpa/org-20170606/org-habit hides /usr/share/emacs/25.3/lisp/org/org-habit
/home/mark/.emacs.d/elpa/org-20170606/ob-python hides /usr/share/emacs/25.3/lisp/org/ob-python
/home/mark/.emacs.d/elpa/org-20170606/ob-fortran hides /usr/share/emacs/25.3/lisp/org/ob-fortran
/home/mark/.emacs.d/elpa/org-20170606/org-archive hides /usr/share/emacs/25.3/lisp/org/org-archive
/home/mark/.emacs.d/elpa/org-20170606/ob-clojure hides /usr/share/emacs/25.3/lisp/org/ob-clojure
/home/mark/.emacs.d/elpa/org-20170606/org-timer hides /usr/share/emacs/25.3/lisp/org/org-timer
/home/mark/.emacs.d/elpa/org-20170606/ob-exp hides /usr/share/emacs/25.3/lisp/org/ob-exp
/home/mark/.emacs.d/elpa/org-20170606/ob-shen hides /usr/share/emacs/25.3/lisp/org/ob-shen
/home/mark/.emacs.d/elpa/org-20170606/org-element hides /usr/share/emacs/25.3/lisp/org/org-element
/home/mark/.emacs.d/elpa/org-20170606/org-docview hides /usr/share/emacs/25.3/lisp/org/org-docview
/home/mark/.emacs.d/elpa/org-20170606/ox-md hides /usr/share/emacs/25.3/lisp/org/ox-md
/home/mark/.emacs.d/elpa/org-20170606/org-ctags hides /usr/share/emacs/25.3/lisp/org/org-ctags
/home/mark/.emacs.d/elpa/org-20170606/org-inlinetask hides /usr/share/emacs/25.3/lisp/org/org-inlinetask
/home/mark/.emacs.d/elpa/org-20170606/ob-keys hides /usr/share/emacs/25.3/lisp/org/ob-keys
/home/mark/.emacs.d/elpa/org-20170606/ob-ledger hides /usr/share/emacs/25.3/lisp/org/ob-ledger
/home/mark/.emacs.d/elpa/org-20170606/org-entities hides /usr/share/emacs/25.3/lisp/org/org-entities
/home/mark/.emacs.d/elpa/org-20170606/org-attach hides /usr/share/emacs/25.3/lisp/org/org-attach
/home/mark/.emacs.d/elpa/org-20170606/ox-odt hides /usr/share/emacs/25.3/lisp/org/ox-odt
/home/mark/.emacs.d/elpa/org-20170606/ob-ocaml hides /usr/share/emacs/25.3/lisp/org/ob-ocaml
/home/mark/.emacs.d/elpa/org-20170606/ob-gnuplot hides /usr/share/emacs/25.3/lisp/org/ob-gnuplot
/home/mark/.emacs.d/elpa/org-20170606/ob-maxima hides /usr/share/emacs/25.3/lisp/org/ob-maxima
/home/mark/.emacs.d/elpa/org-20170606/ob-latex hides /usr/share/emacs/25.3/lisp/org/ob-latex
/home/mark/.emacs.d/elpa/org-20170606/ox-latex hides /usr/share/emacs/25.3/lisp/org/ox-latex
/home/mark/.emacs.d/elpa/org-20170606/ox-texinfo hides /usr/share/emacs/25.3/lisp/org/ox-texinfo
/home/mark/.emacs.d/elpa/org-20170606/ob-scheme hides /usr/share/emacs/25.3/lisp/org/ob-scheme
/home/mark/.emacs.d/elpa/org-20170606/org-crypt hides /usr/share/emacs/25.3/lisp/org/org-crypt
/home/mark/.emacs.d/elpa/org-20170606/ob-eval hides /usr/share/emacs/25.3/lisp/org/ob-eval
/home/mark/.emacs.d/elpa/org-20170606/ox-publish hides /usr/share/emacs/25.3/lisp/org/ox-publish
/home/mark/.emacs.d/elpa/org-20170606/ob-lisp hides /usr/share/emacs/25.3/lisp/org/ob-lisp
/home/mark/.emacs.d/elpa/org-20170606/org-info hides /usr/share/emacs/25.3/lisp/org/org-info
/home/mark/.emacs.d/elpa/org-20170606/ob-ditaa hides /usr/share/emacs/25.3/lisp/org/ob-ditaa
/home/mark/.emacs.d/elpa/org-20170606/ob-R hides /usr/share/emacs/25.3/lisp/org/ob-R
/home/mark/.emacs.d/elpa/org-20170606/org-datetree hides /usr/share/emacs/25.3/lisp/org/org-datetree
/home/mark/.emacs.d/elpa/org-20170606/ox-icalendar hides /usr/share/emacs/25.3/lisp/org/ox-icalendar
/home/mark/.emacs.d/elpa/org-20170606/ob-io hides /usr/share/emacs/25.3/lisp/org/ob-io
/home/mark/.emacs.d/elpa/org-20170606/org-footnote hides /usr/share/emacs/25.3/lisp/org/org-footnote
/home/mark/.emacs.d/elpa/org-20170606/org-mhe hides /usr/share/emacs/25.3/lisp/org/org-mhe
/home/mark/.emacs.d/elpa/org-20170606/org-colview hides /usr/share/emacs/25.3/lisp/org/org-colview
/home/mark/.emacs.d/elpa/org-20170606/ob-css hides /usr/share/emacs/25.3/lisp/org/ob-css
/home/mark/.emacs.d/elpa/org-20170606/ob-plantuml hides /usr/share/emacs/25.3/lisp/org/ob-plantuml
/home/mark/.emacs.d/elpa/org-20170606/ob-emacs-lisp hides /usr/share/emacs/25.3/lisp/org/ob-emacs-lisp
/home/mark/.emacs.d/elpa/org-20170606/ox-html hides /usr/share/emacs/25.3/lisp/org/ox-html
/home/mark/.emacs.d/elpa/org-20170606/org-macro hides /usr/share/emacs/25.3/lisp/org/org-macro
/home/mark/.emacs.d/elpa/org-20170606/ob-ruby hides /usr/share/emacs/25.3/lisp/org/ob-ruby
/home/mark/.emacs.d/elpa/org-20170606/org-id hides /usr/share/emacs/25.3/lisp/org/org-id
/home/mark/.emacs.d/elpa/org-20170606/ob-tangle hides /usr/share/emacs/25.3/lisp/org/ob-tangle
/home/mark/.emacs.d/elpa/org-20170606/ob-screen hides /usr/share/emacs/25.3/lisp/org/ob-screen
/home/mark/.emacs.d/elpa/org-20170606/ob-sql hides /usr/share/emacs/25.3/lisp/org/ob-sql
/home/mark/.emacs.d/elpa/org-20170606/org-install hides /usr/share/emacs/25.3/lisp/org/org-install
/home/mark/.emacs.d/elpa/org-20170606/org-capture hides /usr/share/emacs/25.3/lisp/org/org-capture

Features:
(shadow sort mail-extr emacsbug message idna dired format-spec rfc822
mml mml-sec password-cache epg gnus-util mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util help-fns mail-prsvr mail-utils term/xterm
xterm time-date disp-table org-install finder-inf cl-seq cl-macs info rx
package epg-config seq byte-opt gv bytecomp byte-compile cl-extra
help-mode easymenu cconv cl-loaddefs pcase cl-lib uim-leim uim advice
uim-helper uim-candidate uim-preedit uim-key uim-util uim-debug
uim-keymap uim-var uim-version mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment
elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan
thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting xwidget-internal
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 131598 4734)
 (symbols 48 23041 0)
 (miscs 40 45 154)
 (strings 32 23862 4596)
 (string-bytes 1 696636)
 (vectors 16 13413)
 (vector-slots 8 423793 2418)
 (floats 8 198 588)
 (intervals 56 274 8)
 (buffers 976 20))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29871; Package emacs. (Wed, 27 Dec 2017 20:34:02 GMT) Full text and rfc822 format available.

Message #8 received at 29871 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Mark Shoulson" <mark <at> nagas.meson.org>
Cc: 29871 <at> debbugs.gnu.org
Subject: Re: bug#29871: 25.3; ZWJ word-boundaries in regexps
Date: Wed, 27 Dec 2017 22:33:22 +0200
> From: "Mark Shoulson" <mark <at> nagas.meson.org>
> Date: Wed, 27 Dec 2017 14:07:40 -0500
> 
> According to http://unicode.org/reports/tr29/#Word_Boundaries rule WB4,
> it would seem that a ZWJ character (U+200D ZERO WIDTH JOINER) between
> two "word" characters should not constitute a word boundary.  And yet:
> 
> (string-match "\\<" "foo\u200Dfbar" 1)
> 
> evaluates to 4 (the 1 is to skip the word-beginning at the start of the
> string).  Or you can search for "\\b" or "\\>" and get 3.  Either way,
> indicative of a word-break at the ZWJ character.  Is this correct?

Emacs considers a change of script as a word break, and U+200D's
script is 'symbol', which is different from 'latin', the script of the
ASCII characters.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29871; Package emacs. (Sat, 28 Sep 2019 23:29:02 GMT) Full text and rfc822 format available.

Message #11 received at 29871 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Mark Shoulson <mark <at> nagas.meson.org>, 29871 <at> debbugs.gnu.org
Subject: Re: bug#29871: 25.3; ZWJ word-boundaries in regexps
Date: Sun, 29 Sep 2019 01:28:02 +0200
tags 29871 + notabug
close 29871
quit

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Mark Shoulson" <mark <at> nagas.meson.org>
>> Date: Wed, 27 Dec 2017 14:07:40 -0500
>>
>> According to http://unicode.org/reports/tr29/#Word_Boundaries rule WB4,
>> it would seem that a ZWJ character (U+200D ZERO WIDTH JOINER) between
>> two "word" characters should not constitute a word boundary.  And yet:
>>
>> (string-match "\\<" "foo\u200Dfbar" 1)
>>
>> evaluates to 4 (the 1 is to skip the word-beginning at the start of the
>> string).  Or you can search for "\\b" or "\\>" and get 3.  Either way,
>> indicative of a word-break at the ZWJ character.  Is this correct?
>
> Emacs considers a change of script as a word break, and U+200D's
> script is 'symbol', which is different from 'latin', the script of the
> ASCII characters.

According to the above explananation, this behaviour is expected.  I'm
therefore closing this as notabug.

Best regards,
Stefan Kangas




Added tag(s) notabug. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sat, 28 Sep 2019 23:29:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 29871 <at> debbugs.gnu.org and "Mark Shoulson" <mark <at> nagas.meson.org> Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sat, 28 Sep 2019 23:29:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 27 Oct 2019 11:24:12 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 183 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.