GNU bug report logs - #55256
27.1; Unicode noncharacters may change writing direction

Previous Next

Package: emacs;

Reported by: frederik.fouvry <at> acrolinx.com

Date: Wed, 4 May 2022 07:07:01 UTC

Severity: normal

Tags: moreinfo

Found in version 27.1

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 55256 in the body.
You can then email your comments to 55256 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#55256; Package emacs. (Wed, 04 May 2022 07:07:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to frederik.fouvry <at> acrolinx.com:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 04 May 2022 07:07:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: frederik.fouvry <at> acrolinx.com
To: bug-gnu-emacs <at> gnu.org
Subject: 27.1; Unicode noncharacters may change writing direction
Date: Wed, 04 May 2022 09:06:37 +0200
M-x set-input-method RET
ucs RET
jufdd0$1 RET

If you type C-a and then step through the characters with the right
arrow, the direction is reversed. It seems like the entry of $ followed
by something else is triggering it, but that may just be an
impression/side effect.

I suspect that the reason is that most Unicode noncharacters
(https://www.unicode.org/faq/private_use.html#nonchar1) are in an Arabic
block (https://www.unicode.org/faq/private_use.html#nonchar4b) and that
the cause is an incorrect generalisation of the properties of the
characters in this block.

In GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0)
 of 2020-09-19 built on lgw01-amd64-021
Windowing system distributor 'The X.Org Foundation', version 11.0.12013000
System Description: Ubuntu 20.04.4 LTS

Recent messages:
Char: �‎ (64976, #o176720, #xfdd0, file ...) point=4361 of 7190 (61%) column=23
Mark set [2 times]
Auto-saving...done

Configured using:
 'configure --build=x86_64-linux-gnu --prefix=/usr
 '--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
 '--infodir=${prefix}/share/info' --sysconfdir=/etc --localstatedir=/var
 --disable-silent-rules '--libdir=${prefix}/lib/x86_64-linux-gnu'
 '--libexecdir=${prefix}/lib/x86_64-linux-gnu' --disable-maintainer-mode
 --disable-dependency-tracking --prefix=/usr --sharedstatedir=/var/lib
 --libexecdir=/usr/lib --localstatedir=/var/lib
 --infodir=/usr/share/info --mandir=/usr/share/man
 --enable-locallisppath=/etc/emacs:/usr/local/share/emacs/27.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/27.1/site-lisp:/usr/share/emacs/site-lisp
 --program-suffix=27 --with-modules --with-file-notification=inotify
 --with-mailutils --with-harfbuzz --with-json --with-x=yes
 --with-x-toolkit=gtk3 --with-lcms2 --with-cairo --with-xpm=yes
 --with-gif=yes --with-gnutls=yes --with-jpeg=yes --with-png=yes
 --with-tiff=yes --with-xwidgets 'CFLAGS=-g -O2
 -fdebug-prefix-map=/build/emacs27-bifpWT/emacs27-27.1~1.git86d8d76aa3=. -fstack-protector-strong
 -Wformat -Werror=format-security -no-pie' 'CPPFLAGS=-Wdate-time
 -D_FORTIFY_SOURCE=2' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro
 -no-pie''

Configured features:
XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM DBUS GSETTINGS GLIB NOTIFY
INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF
ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS XWIDGETS
LIBSYSTEMD JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LC_MONETARY: en_GB.UTF-8
  value of $LC_NUMERIC: en_GB.UTF-8
  value of $LC_TIME: en_GB.UTF-8
  value of $LANG: en_GB.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  hexl-follow-ascii: t
  global-git-commit-mode: t
  magit-auto-revert-mode: t
  shell-dirtrack-mode: t
  global-activity-watch-mode: t
  activity-watch-mode: t
  auto-revert-mode: t
  show-paren-mode: t
  desktop-save-mode: t
  display-time-mode: t
  editorconfig-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:

Features:
(shadow sort mail-extr emacsbug ruler-mode hexl uni-input quail misearch
multi-isearch magit-submodule magit-obsolete magit-blame magit-stash
magit-reflog magit-bisect magit-push magit-pull magit-fetch magit-clone
magit-remote magit-commit magit-sequence magit-notes magit-worktree
magit-tag magit-merge magit-branch magit-reset magit-files magit-refs
magit-status magit magit-repos magit-apply magit-wip magit-log
which-func magit-diff smerge-mode diff git-commit magit-core
magit-autorevert magit-margin magit-transient magit-process with-editor
shell magit-mode transient magit-git magit-base magit-section crm dash
compat-27 compat-26 compat flymake-shellcheck flymake-proc flymake
compile sh-script executable css-mode smie imenu rng-xsd xsd-regexp
rng-cmpct rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse
rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln
nxml-rap sgml-mode nxml-util nxml-enc xmltok ol-eww eww mm-url url-queue
ol-rmail ol-mhe ol-irc ol-info ol-gnus nnir ol-docview ol-bibtex ol-bbdb
ol-w3m ol-doi org-link-doi vc-git diff-mode markdown-mode edit-indirect
color dired-aux server activity-watch-mode request autorevert filenotify
ert pp ewoc debug backtrace paren desktop frameset cus-start cus-load
auto-dictionary flyspell ispell time editorconfig-core
editorconfig-core-handle editorconfig-fnmatch timeclock mu4e mu4e-org
mu4e-main mu4e-view mu4e-view-gnus gnus-art mm-uu mml2015 mm-view
mml-smime smime dig gnus-sum url url-proxy url-privacy url-expand
url-methods url-history gnus-group gnus-undo gnus-start gnus-cloud
nnimap nnmail mail-source utf7 netrc nnoo parse-time iso8601 gnus-spec
gnus-int gnus-range gnus-win gnus nnheader wid-edit mu4e-view-common
thingatpt mu4e-headers mu4e-compose mu4e-context mu4e-draft mu4e-actions
ido rfc2368 smtpmail sendmail mu4e-mark mu4e-proc mu4e-utils doc-view
jka-compr image-mode exif mu4e-lists mu4e-message shr url-cookie
url-domsuf url-util svg xml dom flow-fill mule-util mailcap hl-line
mu4e-vars mu4e-meta dired-x calfw-org org-capture org-element avl-tree
generator org-agenda org-refile calfw holidays hol-loaddefs cl
org-journal edmacro kmacro org-crypt org ob ob-tangle ob-ref ob-lob
ob-table ob-exp org-macro org-footnote org-src ob-comint org-pcomplete
pcomplete comint ansi-color org-list org-faces org-entities noutline
outline org-version ob-emacs-lisp ob-core ob-eval org-table oc-basic
bibtex ol rx org-keys oc org-compat advice org-macs org-loaddefs
find-func cal-iso cal-menu calendar cal-loaddefs vc-svn dsvn log-edit
easy-mmode message rmc puny dired dired-loaddefs format-spec rfc822 mml
mml-sec epa derived epg epg-config gnus-util rmail rmail-loaddefs
warnings text-property-search time-date mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr
mailabbrev mail-utils gmm-utils mailheader ring pcvs-util add-log vc
vc-dispatcher editorconfig cl-extra help-mode use-package-ensure
use-package-core helm-easymenu info package easymenu browse-url
url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs
eieio-loaddefs password-cache json subr-x map url-vars seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric
uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads dbusbind inotify lcms2 dynamic-setting
system-font-setting font-render-setting xwidget-internal cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 398834 33888)
 (symbols 48 37214 1)
 (strings 32 137584 6008)
 (string-bytes 1 4673915)
 (vectors 16 73119)
 (vector-slots 8 1675119 195896)
 (floats 8 531 329)
 (intervals 56 7085 0)
 (buffers 1000 76))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55256; Package emacs. (Wed, 04 May 2022 07:21:02 GMT) Full text and rfc822 format available.

Message #8 received at 55256 <at> debbugs.gnu.org (full text, mbox):

From: frederik.fouvry <at> acrolinx.com
To: 55256 <at> debbugs.gnu.org
Subject: Writing direction
Date: Wed, 04 May 2022 09:20:05 +0200
I forgot to add:

The Unicode noncharacters should not cause a change in writing
direction, since they are not Arabic characters, but a set of characters
for internal use only (no exchange between different parties).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55256; Package emacs. (Wed, 04 May 2022 08:18:02 GMT) Full text and rfc822 format available.

Message #11 received at 55256 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: frederik.fouvry <at> acrolinx.com
Cc: 55256 <at> debbugs.gnu.org
Subject: Re: bug#55256: 27.1;
 Unicode noncharacters may change writing direction
Date: Wed, 04 May 2022 11:17:38 +0300
> Date: Wed, 04 May 2022 09:06:37 +0200
> From: frederik.fouvry--- via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> 
> M-x set-input-method RET
> ucs RET
> jufdd0$1 RET
> 
> If you type C-a and then step through the characters with the right
> arrow, the direction is reversed. It seems like the entry of $ followed
> by something else is triggering it, but that may just be an
> impression/side effect.
> 
> I suspect that the reason is that most Unicode noncharacters
> (https://www.unicode.org/faq/private_use.html#nonchar1) are in an Arabic
> block (https://www.unicode.org/faq/private_use.html#nonchar4b) and that
> the cause is an incorrect generalisation of the properties of the
> characters in this block.

Correct.  We failed to be in sync with the Unicode Standard in this
regard.  Should be fixed now on the master branch.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55256; Package emacs. (Wed, 04 May 2022 08:24:01 GMT) Full text and rfc822 format available.

Message #14 received at 55256 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: frederik.fouvry <at> acrolinx.com
Cc: 55256 <at> debbugs.gnu.org
Subject: Re: bug#55256: Writing direction
Date: Wed, 04 May 2022 11:24:04 +0300
> Date: Wed, 04 May 2022 09:20:05 +0200
> From: frederik.fouvry--- via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> 
> I forgot to add:
> 
> The Unicode noncharacters should not cause a change in writing
> direction, since they are not Arabic characters, but a set of characters
> for internal use only (no exchange between different parties).

That is not entirely true, because Unicode assigns default Bidi Class
properties to some unassigned codepoints, and Emacs obeys that.  So an
unassigned codepoint (which is AFAIU what "noncharacter" stands for in
your terminology) for which Unicode says that its Bidi Class should
be, for example, AL, _will_ cause change of text directionality.

If you use those unassigned codepoints for private use, you will have
to override the default properties by manually modifying the relevant
Emacs char-tables at run time.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55256; Package emacs. (Wed, 04 May 2022 09:04:02 GMT) Full text and rfc822 format available.

Message #17 received at 55256 <at> debbugs.gnu.org (full text, mbox):

From: Frederik Fouvry <frederik.fouvry <at> acrolinx.com>
To: 55256 <at> debbugs.gnu.org
Subject: Re: bug#55256: Writing direction
Date: Wed, 4 May 2022 11:02:50 +0200
[Message part 1 (text/plain, inline)]
On Wed, 4 May 2022 at 10:23, Eli Zaretskii <eliz <at> gnu.org> wrote:

> > Date: Wed, 04 May 2022 09:20:05 +0200
> > From: frederik.fouvry--- via "Bug reports for GNU Emacs,
> >  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> >
> >
> > I forgot to add:
> >
> > The Unicode noncharacters should not cause a change in writing
> > direction, since they are not Arabic characters, but a set of characters
> > for internal use only (no exchange between different parties).
>
> That is not entirely true, because Unicode assigns default Bidi Class
> properties to some unassigned codepoints, and Emacs obeys that.  So an
> unassigned codepoint (which is AFAIU what "noncharacter" stands for in
> your terminology) for which Unicode says that its Bidi Class should
> be, for example, AL, _will_ cause change of text directionality.
>
> If you use those unassigned codepoints for private use, you will have
> to override the default properties by manually modifying the relevant
> Emacs char-tables at run time.
>

That sounds fair enough. I admit that my statement was generalising too
much.

The odd name "noncharacter" is Unicode terminology, not mine (see e.g. Spec
v14, Ch. 2, p.30).
<https://www.facebook.com/Acrolinx-127089923970436/>
[Message part 2 (text/html, inline)]

Added tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 05 May 2022 11:15:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55256; Package emacs. (Thu, 02 Jun 2022 13:10:02 GMT) Full text and rfc822 format available.

Message #22 received at 55256 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Frederik Fouvry <frederik.fouvry <at> acrolinx.com>
Cc: 55256 <at> debbugs.gnu.org
Subject: Re: bug#55256: 27.1; Unicode noncharacters may change writing
 direction
Date: Thu, 02 Jun 2022 15:09:15 +0200
Frederik Fouvry <frederik.fouvry <at> acrolinx.com> writes:

>  If you use those unassigned codepoints for private use, you will have
>  to override the default properties by manually modifying the relevant
>  Emacs char-tables at run time.
>
> That sounds fair enough. I admit that my statement was generalising too much.

Eli fixed some bits here, and the rest is up to the users of these
unassigned codepoints, if I understand correctly, so I'm closing this
bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 29.1, send any further explanations to 55256 <at> debbugs.gnu.org and frederik.fouvry <at> acrolinx.com Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 02 Jun 2022 13:10:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 01 Jul 2022 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 298 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.