GNU bug report logs - #61269
28.2; Sequence of spaces preceding tab in bidirectional line

Previous Next

Package: emacs;

Reported by: Halim <mhalimln <at> outlook.com>

Date: Sat, 4 Feb 2023 07:47:02 UTC

Severity: normal

Found in version 28.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61269 in the body.
You can then email your comments to 61269 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#61269; Package emacs. (Sat, 04 Feb 2023 07:47:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Halim <mhalimln <at> outlook.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 04 Feb 2023 07:47:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Halim <mhalimln <at> outlook.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.2; Sequence of spaces preceding tab in bidirectional line
Date: Sat, 04 Feb 2023 02:41:35 +0700
  In a left-to-right line emacs display a sequence of one or more
spaces (U+0020), where the spaces precede a tab (U+0009) and they
both appear between two right-to-left alphabet, to the left of the
first (in typing order) rtl alphabet.

  The bug does not present when the rtl text is inside an rtl
isolate.

  Let s represent space, t represet tab, l represent itself, r and
m represent arabic alphabet. The following example have this format
in typing order from left to right.

Format:
lsrssstm

Example text:
l ح   	م

  The expected display is 'lsrssstm', the actual is 'lssssrtm'.
The spaces following 'r' in the format is displayed to the left
of 'r' in the actual display. Using 'C-f' from 'r' moves the
cursor to the left until it hits 't' where the cursor move to
the right of 'r'.

  I have tried to view the file containing the buggy text in
focuswriter and fribidi. They both display the same expected
way.

Extra Info

  The bug also present to ltr text on rtl line. I believe
this is generic and is caused by this line
'&& level != bidi_it->level_stack[0].level' (see below).

  The bug also present in emacs built from commit
'ac7ec87a7a0db887e4ae7fe9005aea517958b778' with
--without-all. In this commit I make the following
modification.

---------------
$ git diff ac7ec87a7a0db887e4ae7fe9005aea517958b778
diff --git a/src/bidi.c b/src/bidi.c
index e012512..fe6e4d6 100644
--- a/src/bidi.c
+++ b/src/bidi.c
@@ -3302,10 +3302,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
   if ((bidi_it->orig_type == NEUTRAL_WS
        || bidi_it->orig_type == WEAK_BN
        || bidi_isolate_fmt_char (bidi_it->orig_type))
-      && bidi_it->next_for_ws.charpos < bidi_it->charpos
-      /* If this character is already at base level, we don't need to
-        reset it, so avoid the potentially costly loop below.  */
-      && level != bidi_it->level_stack[0].level)
+      && bidi_it->next_for_ws.charpos < bidi_it->charpos)
     {
       int ch;
       ptrdiff_t clen = bidi_it->ch_len;
---------------

It fixes the bug.
  

In GNU Emacs 28.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.36, cairo version 1.17.6)
 of 2023-01-03 built on 2
Windowing system distributor 'The X.Org Foundation', version 11.0.12101006
System Description: Arch Linux

Configured using:
 'configure --sysconfdir=/etc --prefix=/usr --libexecdir=/usr/lib
 --localstatedir=/var --with-cairo --with-harfbuzz --with-libsystemd
 --with-modules --with-x-toolkit=gtk3 'CFLAGS=-march=x86-64
 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2
 -Wformat -Werror=format-security -fstack-clash-protection
 -fcf-protection -g
 -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto'
 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE
XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  delete-selection-mode: t
  cua-mode: t
  umath-mode: umath-insert-common
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug sendmail misearch multi-isearch
mule-util jka-compr nndraft nnmh nnfolder utf-7 rfc2104 gnutls
gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art
mm-uu mml2015 mm-view mml-smime smime dig nntp gnus-cache gnus-sum shr
kinsoku svg dom gnus-group gnus-undo gnus-start gnus-dbus dbus xml
gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo parse-time iso8601
gnus-spec gnus-int gnus-range message dired dired-loaddefs rfc822 mml
mml-sec epa mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader gnus-win gnus nnheader gnus-util rmail
rmail-loaddefs rfc2047 rfc2045 ietf-drums time-date mail-utils mm-util
mail-prsvr display-fill-column-indicator display-line-numbers delsel
cua-base cus-load lsp-mode lsp-protocol help-mode xref project
tree-widget wid-edit spinner pcase network-stream puny nsm rmc
markdown-mode rx color thingatpt noutline outline lv inline imenu ht
filenotify f f-shortdoc shortdoc s ewoc epg rfc6068 epg-config dash
compile text-property-search comint ansi-color ring finder-inf edmacro
kmacro easy-mmode derived info cl package browse-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util mailcap url-handlers url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json subr-x map
url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 386790 21130)
 (symbols 48 30110 6)
 (strings 32 132616 6853)
 (string-bytes 1 3608021)
 (vectors 16 51861)
 (vector-slots 8 610382 31136)
 (floats 8 356 324)
 (intervals 56 4882 0)
 (buffers 992 21))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#61269; Package emacs. (Sat, 04 Feb 2023 11:39:01 GMT) Full text and rfc822 format available.

Message #8 received at 61269 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Halim <mhalimln <at> outlook.com>
Cc: 61269 <at> debbugs.gnu.org
Subject: Re: bug#61269: 28.2;
 Sequence of spaces preceding tab in bidirectional line
Date: Sat, 04 Feb 2023 13:38:20 +0200
> From: Halim <mhalimln <at> outlook.com>
> Date: Sat, 04 Feb 2023 02:41:35 +0700
> 
> 
>   In a left-to-right line emacs display a sequence of one or more
> spaces (U+0020), where the spaces precede a tab (U+0009) and they
> both appear between two right-to-left alphabet, to the left of the
> first (in typing order) rtl alphabet.
> 
>   The bug does not present when the rtl text is inside an rtl
> isolate.
> 
>   Let s represent space, t represet tab, l represent itself, r and
> m represent arabic alphabet. The following example have this format
> in typing order from left to right.
> 
> Format:
> lsrssstm
> 
> Example text:
> l ح   	م
> 
>   The expected display is 'lsrssstm', the actual is 'lssssrtm'.
> The spaces following 'r' in the format is displayed to the left
> of 'r' in the actual display. Using 'C-f' from 'r' moves the
> cursor to the left until it hits 't' where the cursor move to
> the right of 'r'.
> 
>   I have tried to view the file containing the buggy text in
> focuswriter and fribidi. They both display the same expected
> way.
> 
> Extra Info
> 
>   The bug also present to ltr text on rtl line. I believe
> this is generic and is caused by this line
> '&& level != bidi_it->level_stack[0].level' (see below).
> 
>   The bug also present in emacs built from commit
> 'ac7ec87a7a0db887e4ae7fe9005aea517958b778' with
> --without-all. In this commit I make the following
> modification.
> 
> ---------------
> $ git diff ac7ec87a7a0db887e4ae7fe9005aea517958b778
> diff --git a/src/bidi.c b/src/bidi.c
> index e012512..fe6e4d6 100644
> --- a/src/bidi.c
> +++ b/src/bidi.c
> @@ -3302,10 +3302,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
>    if ((bidi_it->orig_type == NEUTRAL_WS
>         || bidi_it->orig_type == WEAK_BN
>         || bidi_isolate_fmt_char (bidi_it->orig_type))
> -      && bidi_it->next_for_ws.charpos < bidi_it->charpos
> -      /* If this character is already at base level, we don't need to
> -        reset it, so avoid the potentially costly loop below.  */
> -      && level != bidi_it->level_stack[0].level)
> +      && bidi_it->next_for_ws.charpos < bidi_it->charpos)
>      {
>        int ch;
>        ptrdiff_t clen = bidi_it->ch_len;
> ---------------
> 
> It fixes the bug.

Thanks.

You are right that the logic there was flawed.  However, just removing
the base-level test is sub-optimal: that test was added to speed up
redisplay when the buffer has a lot of control characters (e.g.,
binary null bytes) that don't need to be reordered; see bug#22739.

So I have installed a slightly different change, reproduced below;
please see that it solves the problem, including (presumably) some
real-life problems you had in displaying RTL text with embedded TABs.

diff --git a/src/bidi.c b/src/bidi.c
index e012512..93875d2 100644
--- a/src/bidi.c
+++ b/src/bidi.c
@@ -3300,12 +3300,15 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
      it belongs to a sequence of WS characters preceding a newline
      or a TAB or a paragraph separator.  */
   if ((bidi_it->orig_type == NEUTRAL_WS
-       || bidi_it->orig_type == WEAK_BN
+       || (bidi_it->orig_type == WEAK_BN
+	   /* If this BN character is already at base level, we don't
+	      need to consider resetting it, since I1 and I2 below
+	      will not change the level, so avoid the potentially
+	      costly loop below.  */
+	   && level != bidi_it->level_stack[0].level)
        || bidi_isolate_fmt_char (bidi_it->orig_type))
-      && bidi_it->next_for_ws.charpos < bidi_it->charpos
-      /* If this character is already at base level, we don't need to
-	 reset it, so avoid the potentially costly loop below.  */
-      && level != bidi_it->level_stack[0].level)
+      /* This means the informaition about WS resolution is not valid.  */
+      && bidi_it->next_for_ws.charpos < bidi_it->charpos)
     {
       int ch;
       ptrdiff_t clen = bidi_it->ch_len;
@@ -3340,7 +3343,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
       || bidi_it->orig_type == NEUTRAL_S
       || bidi_it->ch == '\n' || bidi_it->ch == BIDI_EOB
       || ((bidi_it->orig_type == NEUTRAL_WS
-	   || bidi_it->orig_type == WEAK_BN
+	   || bidi_it->orig_type == WEAK_BN /* L1/Retaining */
 	   || bidi_isolate_fmt_char (bidi_it->orig_type)
 	   || bidi_explicit_dir_char (bidi_it->ch))
 	  && (bidi_it->next_for_ws.type == NEUTRAL_B




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#61269; Package emacs. (Sun, 05 Feb 2023 17:04:01 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Halim <mhalimln <at> outlook.com>
To: bug-gnu-emacs <at> gnu.org
Subject: bug#61269: 28.2; Sequence of spaces preceding tab in bidirectional
 line
Date: Sun, 05 Feb 2023 23:55:38 +0700
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Halim <mhalimln <at> outlook.com>
>> Date: Sat, 04 Feb 2023 02:41:35 +0700
>> 
>> 
>>   In a left-to-right line emacs display a sequence of one or more
>> spaces (U+0020), where the spaces precede a tab (U+0009) and they
>> both appear between two right-to-left alphabet, to the left of the
>> first (in typing order) rtl alphabet.
>> 
>>   The bug does not present when the rtl text is inside an rtl
>> isolate.
>> 
>>   Let s represent space, t represet tab, l represent itself, r and
>> m represent arabic alphabet. The following example have this format
>> in typing order from left to right.
>> 
>> Format:
>> lsrssstm
>> 
>> Example text:
>> l ح   	م
>> 
>>   The expected display is 'lsrssstm', the actual is 'lssssrtm'.
>> The spaces following 'r' in the format is displayed to the left
>> of 'r' in the actual display. Using 'C-f' from 'r' moves the
>> cursor to the left until it hits 't' where the cursor move to
>> the right of 'r'.
>> 
>>   I have tried to view the file containing the buggy text in
>> focuswriter and fribidi. They both display the same expected
>> way.
>> 
>> Extra Info
>> 
>>   The bug also present to ltr text on rtl line. I believe
>> this is generic and is caused by this line
>> '&& level != bidi_it->level_stack[0].level' (see below).
>> 
>>   The bug also present in emacs built from commit
>> 'ac7ec87a7a0db887e4ae7fe9005aea517958b778' with
>> --without-all. In this commit I make the following
>> modification.
>> 
>> ---------------
>> $ git diff ac7ec87a7a0db887e4ae7fe9005aea517958b778
>> diff --git a/src/bidi.c b/src/bidi.c
>> index e012512..fe6e4d6 100644
>> --- a/src/bidi.c
>> +++ b/src/bidi.c
>> @@ -3302,10 +3302,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
>>    if ((bidi_it->orig_type == NEUTRAL_WS
>>         || bidi_it->orig_type == WEAK_BN
>>         || bidi_isolate_fmt_char (bidi_it->orig_type))
>> -      && bidi_it->next_for_ws.charpos < bidi_it->charpos
>> -      /* If this character is already at base level, we don't need to
>> -        reset it, so avoid the potentially costly loop below.  */
>> -      && level != bidi_it->level_stack[0].level)
>> +      && bidi_it->next_for_ws.charpos < bidi_it->charpos)
>>      {
>>        int ch;
>>        ptrdiff_t clen = bidi_it->ch_len;
>> ---------------
>> 
>> It fixes the bug.
>
> Thanks.
>
> You are right that the logic there was flawed.  However, just removing
> the base-level test is sub-optimal: that test was added to speed up
> redisplay when the buffer has a lot of control characters (e.g.,
> binary null bytes) that don't need to be reordered; see bug#22739.
>
> So I have installed a slightly different change, reproduced below;
> please see that it solves the problem, including (presumably) some
> real-life problems you had in displaying RTL text with embedded TABs.
>
> diff --git a/src/bidi.c b/src/bidi.c
> index e012512..93875d2 100644
> --- a/src/bidi.c
> +++ b/src/bidi.c
> @@ -3300,12 +3300,15 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
>       it belongs to a sequence of WS characters preceding a newline
>       or a TAB or a paragraph separator.  */
>    if ((bidi_it->orig_type == NEUTRAL_WS
> -       || bidi_it->orig_type == WEAK_BN
> +       || (bidi_it->orig_type == WEAK_BN
> +	   /* If this BN character is already at base level, we don't
> +	      need to consider resetting it, since I1 and I2 below
> +	      will not change the level, so avoid the potentially
> +	      costly loop below.  */
> +	   && level != bidi_it->level_stack[0].level)
>         || bidi_isolate_fmt_char (bidi_it->orig_type))
> -      && bidi_it->next_for_ws.charpos < bidi_it->charpos
> -      /* If this character is already at base level, we don't need to
> -	 reset it, so avoid the potentially costly loop below.  */
> -      && level != bidi_it->level_stack[0].level)
> +      /* This means the informaition about WS resolution is not valid.  */
> +      && bidi_it->next_for_ws.charpos < bidi_it->charpos)
>      {
>        int ch;
>        ptrdiff_t clen = bidi_it->ch_len;
> @@ -3340,7 +3343,7 @@ bidi_level_of_next_char (struct bidi_it *bidi_it)
>        || bidi_it->orig_type == NEUTRAL_S
>        || bidi_it->ch == '\n' || bidi_it->ch == BIDI_EOB
>        || ((bidi_it->orig_type == NEUTRAL_WS
> -	   || bidi_it->orig_type == WEAK_BN
> +	   || bidi_it->orig_type == WEAK_BN /* L1/Retaining */
>  	   || bidi_isolate_fmt_char (bidi_it->orig_type)
>  	   || bidi_explicit_dir_char (bidi_it->ch))
>  	  && (bidi_it->next_for_ws.type == NEUTRAL_B

  I have done the same test as I did before and your patch does fix
the problem. Unfortunately I never had any real-life problems as I
did not write any bidi text (I does write, but its only to help my
understanding on UBA), so I cant give any result on this.

Thanks.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 05 Feb 2023 17:18:02 GMT) Full text and rfc822 format available.

Notification sent to Halim <mhalimln <at> outlook.com>:
bug acknowledged by developer. (Sun, 05 Feb 2023 17:18:02 GMT) Full text and rfc822 format available.

Message #16 received at 61269-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Halim <mhalimln <at> outlook.com>
Cc: 61269-done <at> debbugs.gnu.org
Subject: Re: bug#61269: 28.2;
 Sequence of spaces preceding tab in bidirectional line
Date: Sun, 05 Feb 2023 19:17:52 +0200
> From: Halim <mhalimln <at> outlook.com>
> Date: Sun, 05 Feb 2023 23:55:38 +0700
> 
>   I have done the same test as I did before and your patch does fix
> the problem. Unfortunately I never had any real-life problems as I
> did not write any bidi text (I does write, but its only to help my
> understanding on UBA), so I cant give any result on this.

OK, thanks.  So I'm closing this bug; feel free to reopen if you
encounter some similar issues with whitespace and TABs in bidi
context.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 06 Mar 2023 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 23 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.