GNU bug report logs - #35721
27.0.50; Strange Arabic shaping behavior

Previous Next

Package: emacs;

Reported by: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Date: Mon, 13 May 2019 22:20:02 UTC

Severity: normal

Tags: notabug

Found in version 27.0.50

Done: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35721 in the body.
You can then email your comments to 35721 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Mon, 13 May 2019 22:20:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Basil L. Contovounesios" <contovob <at> tcd.ie>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 13 May 2019 22:20:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; Strange Arabic shaping behavior
Date: Mon, 13 May 2019 23:09:06 +0100
[Message part 1 (text/plain, inline)]
I see the following on the master, harfbuzz, and emacs-26 branches
(precise versions follow my signature), but I'm not sure how much of
this is expected or due to e.g. my font.

0. emacs -Q
1. C-x 8 RET 0634 RET

The "tail" of the sheen is truncated by the fringe:

[01.png (image/png, inline)]
[Message part 3 (text/plain, inline)]
2. C-a C-u C-x =

--8<---------------cut here---------------start------------->8---
             position: 146 of 146 (99%), column: 0
            character: ش‎ (displayed as ش‎) (codepoint 1588, #o3064, #x634)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x0634
               script: arabic
               syntax: w 	which means: word
             category: .:Base, R:Right-to-left (strong), b:Arabic
             to input: type "C-x 8 RET 634" or "C-x 8 RET ARABIC LETTER SHEEN"
          buffer code: #xD8 #xB4
            file code: #xD8 #xB4 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-PfEd-DejaVu Sans Mono-normal-normal-normal-*-19-*-*-*-m-0-iso10646-1 (#x46A)

Character code properties: customize what to show
  name: ARABIC LETTER SHEEN
  general-category: Lo (Letter, Other)
  decomposition: (1588) ('ش')

There are text properties here:
  fontified            nil
--8<---------------cut here---------------end--------------->8---

3. SPC

The "tail" of the sheen becomes visible, but falls outside of the box
cursor:

[02.png (image/png, inline)]
[Message part 5 (text/plain, inline)]
4. C-x 8 RET 0643 RET

The kaf is correctly shaped in its initial form:

[03.png (image/png, inline)]
[Message part 7 (text/plain, inline)]
5. C-SPC

The kaf changes to its isolated form:

[04.png (image/png, inline)]
[Message part 9 (text/plain, inline)]
6. C-g C-a C-k
7. C-u C-\ arabic RET
8. a ; RET

The sheen is correctly shaped in its initial form and the kaf is
truncated by the fringe:

[05.png (image/png, inline)]
[Message part 11 (text/plain, inline)]
9. a ; RET

The first sheen unexpectedly changes to its isolated form:

[06.png (image/png, inline)]
[Message part 13 (text/plain, inline)]
I occasionally see this happen even without typing anything, as if by a
timer, but I'm not sure how to reproduce it.  I think, without being
100% certain, that it's only happened while using the 'arabic' input
method.

10. a

The first sheen reverts to its initial form:

[07.png (image/png, inline)]
[Message part 15 (text/plain, inline)]
Any insights?  Thanks,

-- 
Basil

In GNU Emacs 27.0.50 (build 40, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2019-05-13 built on thunk
Repository revision: a1e5cce99b75c1bd50995b7b4d81423b1296fa60
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12003000
System Description: Debian GNU/Linux buster/sid

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O2 -march=native' --config-cache
 --prefix=/home/blc/.local --with-mailutils --with-x-toolkit=lucid
 --with-modules --with-file-notification=yes --with-x'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS
LIBSYSTEMD JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: en_IE.UTF-8
  locale-coding-system: utf-8-unix


In GNU Emacs 27.0.50 (build 2, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2019-05-13 built on thunk
Repository revision: 5d7dafacf4afc888511649f6fc24c28210cd0dfc
Repository branch: harfbuzz
Windowing system distributor 'The X.Org Foundation', version 11.0.12003000
System Description: Debian GNU/Linux buster/sid

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4'
 --config-cache --prefix=/home/blc/.local --program-suffix=-harfbuzz
 --enable-checking=yes,glyphs --enable-check-lisp-object-type
 --with-mailutils --with-x-toolkit=lucid --with-modules
 --with-file-notification=yes --with-x'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ
M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES
THREADS LIBSYSTEMD JSON PDUMPER LCMS2 GMP


In GNU Emacs 26.2.50 (build 2, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2019-04-30 built on thunk
Repository revision: c26d452ae15a74f0eeec53ba529eebaa95eb5489
Windowing system distributor 'The X.Org Foundation', version 11.0.12003000
System Description:	Debian GNU/Linux buster/sid

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4'
 --config-cache --prefix=/home/blc/.local --program-suffix=26
 --enable-checking=yes,glyphs --enable-check-lisp-object-type
 --with-mailutils --with-x-toolkit=lucid --with-modules
 --with-file-notification=yes --with-x'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT
ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD
LCMS2

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Tue, 14 May 2019 15:11:02 GMT) Full text and rfc822 format available.

Message #8 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>,
 Behdad Esfahbod <behdad <at> behdad.org>, Kenichi Handa <handa <at> gnu.org>
Cc: 35721 <at> debbugs.gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Tue, 14 May 2019 18:10:19 +0300
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Date: Mon, 13 May 2019 23:09:06 +0100
> 
> I see the following on the master, harfbuzz, and emacs-26 branches
> (precise versions follow my signature), but I'm not sure how much of
> this is expected or due to e.g. my font.
> 
> 0. emacs -Q
> 1. C-x 8 RET 0634 RET
> 
> The "tail" of the sheen is truncated by the fringe:

After looking at the code and thinking about this, I think this is a
feature (as strange as it may sound); see below for why I think so.
And yes, it definitely depends on the font, in this case DejaVu Sans
Mono.  I don't see this with any other fixed-pitch font I have.

I'm not 100% sure I'm right here, so I CC Behdad and Kenichi in the
hope that they will comment on this.  Behdad, I see a very similar
issue with hb-view, when it renders this character using DejaVu Sans
Mono, so it isn't just an Emacs issue (and seeing what hb-view
produces actually made me think my opinion is correct about this).

> 3. SPC
> 
> The "tail" of the sheen becomes visible, but falls outside of the box
> cursor:

Yes, this particular font's glyph for sheen has a negative value of
left bearing.  Which AFAIU means it extends beyond the box dimensions
to the left.

> 4. C-x 8 RET 0643 RET
> 
> The kaf is correctly shaped in its initial form:
> 
> 5. C-SPC
> 
> The kaf changes to its isolated form:

This is different problem, related to how we redraw portions of the
buffer inside the region (more generally, those which have colors
different from the default face).

The problem is that we only pass to the shaping engine stretches of
text that have the same face.  The basic reason for that is that a
different face can use a different font, and we can only handle
character composition for characters supported by the same font.
Another fundamental reason is that the display engine processes text
in chunks that have the same face.  So when the active region, or some
other Emacs feature, paints portions of text in some non-default face,
we redraw the display, and pass to the shaping engine only the portion
that has that different face.  If that portion is a single character,
you will see that it loses its correct shape and is rendered in its
isolated form.  And if the colors change between two characters that
need to be shaped together, the shaping will break.

You can easily see this effect if you display HELLO, and then
shift-select portions of the Arabic greeting (or any other script that
is a heavy user of character compositions).

To fix this, we need some mechanism that will pass larger chunks of
text to the shaper in these cases, which will need some changes in how
the display engine iterates through buffer/string text when it
prepares them for display: we currently stop at every change of face.

Patches to fix this are most welcome.

> I occasionally see this happen even without typing anything, as if by a
> timer, but I'm not sure how to reproduce it.  I think, without being
> 100% certain, that it's only happened while using the 'arabic' input
> method.

Maybe, but given my description above, I'm not surprised, because it's
enough that Emacs decides, for some reason, to redraw just that one
character.

Now to the original problem.  Let me turn the table and ask you: what
did you expect to happen instead?  This is a fixed-pitch font, so how
can Emacs display a character that extends to the left from its box,
at the left-most window coordinate?  It has no choice but consider its
extension be off-screen, as if the window was hscrolled.

The "normal" case for this character is to be part of R2L text, which
begins at the right window margin, and flows to the left.  In that
case, the extension will overlap the character cell of the next (in
the logical order) character.

So I think we have no bug here, we behave as expected.  If not, I'm
sure Behdad and Kenichi will correct me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Tue, 14 May 2019 18:25:02 GMT) Full text and rfc822 format available.

Message #11 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: contovob <at> tcd.ie, behdad <at> behdad.org, handa <at> gnu.org,
 Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 35721 <at> debbugs.gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Tue, 14 May 2019 21:23:46 +0300
Adding Khaled.

The original message with images can be viewed here:

  https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35721#5

> Date: Tue, 14 May 2019 18:10:19 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 35721 <at> debbugs.gnu.org
> 
> > From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> > Date: Mon, 13 May 2019 23:09:06 +0100
> > 
> > I see the following on the master, harfbuzz, and emacs-26 branches
> > (precise versions follow my signature), but I'm not sure how much of
> > this is expected or due to e.g. my font.
> > 
> > 0. emacs -Q
> > 1. C-x 8 RET 0634 RET
> > 
> > The "tail" of the sheen is truncated by the fringe:
> 
> After looking at the code and thinking about this, I think this is a
> feature (as strange as it may sound); see below for why I think so.
> And yes, it definitely depends on the font, in this case DejaVu Sans
> Mono.  I don't see this with any other fixed-pitch font I have.
> 
> I'm not 100% sure I'm right here, so I CC Behdad and Kenichi in the
> hope that they will comment on this.  Behdad, I see a very similar
> issue with hb-view, when it renders this character using DejaVu Sans
> Mono, so it isn't just an Emacs issue (and seeing what hb-view
> produces actually made me think my opinion is correct about this).
> 
> > 3. SPC
> > 
> > The "tail" of the sheen becomes visible, but falls outside of the box
> > cursor:
> 
> Yes, this particular font's glyph for sheen has a negative value of
> left bearing.  Which AFAIU means it extends beyond the box dimensions
> to the left.
> 
> > 4. C-x 8 RET 0643 RET
> > 
> > The kaf is correctly shaped in its initial form:
> > 
> > 5. C-SPC
> > 
> > The kaf changes to its isolated form:
> 
> This is different problem, related to how we redraw portions of the
> buffer inside the region (more generally, those which have colors
> different from the default face).
> 
> The problem is that we only pass to the shaping engine stretches of
> text that have the same face.  The basic reason for that is that a
> different face can use a different font, and we can only handle
> character composition for characters supported by the same font.
> Another fundamental reason is that the display engine processes text
> in chunks that have the same face.  So when the active region, or some
> other Emacs feature, paints portions of text in some non-default face,
> we redraw the display, and pass to the shaping engine only the portion
> that has that different face.  If that portion is a single character,
> you will see that it loses its correct shape and is rendered in its
> isolated form.  And if the colors change between two characters that
> need to be shaped together, the shaping will break.
> 
> You can easily see this effect if you display HELLO, and then
> shift-select portions of the Arabic greeting (or any other script that
> is a heavy user of character compositions).
> 
> To fix this, we need some mechanism that will pass larger chunks of
> text to the shaper in these cases, which will need some changes in how
> the display engine iterates through buffer/string text when it
> prepares them for display: we currently stop at every change of face.
> 
> Patches to fix this are most welcome.
> 
> > I occasionally see this happen even without typing anything, as if by a
> > timer, but I'm not sure how to reproduce it.  I think, without being
> > 100% certain, that it's only happened while using the 'arabic' input
> > method.
> 
> Maybe, but given my description above, I'm not surprised, because it's
> enough that Emacs decides, for some reason, to redraw just that one
> character.
> 
> Now to the original problem.  Let me turn the table and ask you: what
> did you expect to happen instead?  This is a fixed-pitch font, so how
> can Emacs display a character that extends to the left from its box,
> at the left-most window coordinate?  It has no choice but consider its
> extension be off-screen, as if the window was hscrolled.
> 
> The "normal" case for this character is to be part of R2L text, which
> begins at the right window margin, and flows to the left.  In that
> case, the extension will overlap the character cell of the next (in
> the logical order) character.
> 
> So I think we have no bug here, we behave as expected.  If not, I'm
> sure Behdad and Kenichi will correct me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Wed, 15 May 2019 23:03:02 GMT) Full text and rfc822 format available.

Message #14 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Behdad Esfahbod <behdad <at> behdad.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: contovob <at> tcd.ie, Kenichi Handa <handa <at> gnu.org>,
 Khaled Hosny <dr.khaled.hosny <at> gmail.com>, 35721 <at> debbugs.gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Wed, 15 May 2019 16:02:19 -0700
[Message part 1 (text/plain, inline)]
Hi Eli,

Pretty much things you said: if font is not mono-spaced, there's nothing we
can do.  Also, if you don't pass neighboring context text to HarfBuzz,
again, nothing we can do.

On Tue, May 14, 2019 at 11:24 AM Eli Zaretskii <eliz <at> gnu.org> wrote:

> Adding Khaled.
>
> The original message with images can be viewed here:
>
>   https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35721#5
>
> > Date: Tue, 14 May 2019 18:10:19 +0300
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: 35721 <at> debbugs.gnu.org
> >
> > > From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> > > Date: Mon, 13 May 2019 23:09:06 +0100
> > >
> > > I see the following on the master, harfbuzz, and emacs-26 branches
> > > (precise versions follow my signature), but I'm not sure how much of
> > > this is expected or due to e.g. my font.
> > >
> > > 0. emacs -Q
> > > 1. C-x 8 RET 0634 RET
> > >
> > > The "tail" of the sheen is truncated by the fringe:
> >
> > After looking at the code and thinking about this, I think this is a
> > feature (as strange as it may sound); see below for why I think so.
> > And yes, it definitely depends on the font, in this case DejaVu Sans
> > Mono.  I don't see this with any other fixed-pitch font I have.
> >
> > I'm not 100% sure I'm right here, so I CC Behdad and Kenichi in the
> > hope that they will comment on this.  Behdad, I see a very similar
> > issue with hb-view, when it renders this character using DejaVu Sans
> > Mono, so it isn't just an Emacs issue (and seeing what hb-view
> > produces actually made me think my opinion is correct about this).
> >
> > > 3. SPC
> > >
> > > The "tail" of the sheen becomes visible, but falls outside of the box
> > > cursor:
> >
> > Yes, this particular font's glyph for sheen has a negative value of
> > left bearing.  Which AFAIU means it extends beyond the box dimensions
> > to the left.
> >
> > > 4. C-x 8 RET 0643 RET
> > >
> > > The kaf is correctly shaped in its initial form:
> > >
> > > 5. C-SPC
> > >
> > > The kaf changes to its isolated form:
> >
> > This is different problem, related to how we redraw portions of the
> > buffer inside the region (more generally, those which have colors
> > different from the default face).
> >
> > The problem is that we only pass to the shaping engine stretches of
> > text that have the same face.  The basic reason for that is that a
> > different face can use a different font, and we can only handle
> > character composition for characters supported by the same font.
> > Another fundamental reason is that the display engine processes text
> > in chunks that have the same face.  So when the active region, or some
> > other Emacs feature, paints portions of text in some non-default face,
> > we redraw the display, and pass to the shaping engine only the portion
> > that has that different face.  If that portion is a single character,
> > you will see that it loses its correct shape and is rendered in its
> > isolated form.  And if the colors change between two characters that
> > need to be shaped together, the shaping will break.
> >
> > You can easily see this effect if you display HELLO, and then
> > shift-select portions of the Arabic greeting (or any other script that
> > is a heavy user of character compositions).
> >
> > To fix this, we need some mechanism that will pass larger chunks of
> > text to the shaper in these cases, which will need some changes in how
> > the display engine iterates through buffer/string text when it
> > prepares them for display: we currently stop at every change of face.
> >
> > Patches to fix this are most welcome.
> >
> > > I occasionally see this happen even without typing anything, as if by a
> > > timer, but I'm not sure how to reproduce it.  I think, without being
> > > 100% certain, that it's only happened while using the 'arabic' input
> > > method.
> >
> > Maybe, but given my description above, I'm not surprised, because it's
> > enough that Emacs decides, for some reason, to redraw just that one
> > character.
> >
> > Now to the original problem.  Let me turn the table and ask you: what
> > did you expect to happen instead?  This is a fixed-pitch font, so how
> > can Emacs display a character that extends to the left from its box,
> > at the left-most window coordinate?  It has no choice but consider its
> > extension be off-screen, as if the window was hscrolled.
> >
> > The "normal" case for this character is to be part of R2L text, which
> > begins at the right window margin, and flows to the left.  In that
> > case, the extension will overlap the character cell of the next (in
> > the logical order) character.
> >
> > So I think we have no bug here, we behave as expected.  If not, I'm
> > sure Behdad and Kenichi will correct me.
>


-- 
behdad
http://behdad.org/
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 13:29:02 GMT) Full text and rfc822 format available.

Message #17 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Behdad Esfahbod <behdad <at> behdad.org>
Cc: contovob <at> tcd.ie, handa <at> gnu.org, dr.khaled.hosny <at> gmail.com,
 35721 <at> debbugs.gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 16:28:10 +0300
[Message part 1 (text/plain, inline)]
> From: Behdad Esfahbod <behdad <at> behdad.org>
> Date: Wed, 15 May 2019 16:02:19 -0700
> Cc: contovob <at> tcd.ie, Kenichi Handa <handa <at> gnu.org>, 
> 	Khaled Hosny <dr.khaled.hosny <at> gmail.com>, 35721 <at> debbugs.gnu.org
> 
> Pretty much things you said: if font is not mono-spaced, there's nothing we can do.

Thanks.  Let me be sure I understand what you are saying.  If I invoke
hb-view like this:

  hb-view -u 0x0634 -O png -o sheen.png DejaVuSansMono.ttf

then the result is the following PNG image:

[sheen.png (image/png, attachment)]
[Message part 3 (text/plain, inline)]
Do I understand you correctly that this is the expected result with
that font?  Because what Emacs displays, even without HarfBuzz as its
shaping engine, looks exactly like that: the leftmost part of the
letter is off-screen.

> Also, if you don't pass neighboring context text to HarfBuzz, again,
> nothing we can do.

I believe this is about the other part: displaying text which is
partially selected, when selection is shown as a different background
color.  You are saying that to do its job, a shaping engine needs to
see the entire text, not just the part which has the same colors.
Right?

Thanks again for your comments.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 13:46:01 GMT) Full text and rfc822 format available.

Message #20 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: contovob <at> tcd.ie, Behdad Esfahbod <behdad <at> behdad.org>, 35721 <at> debbugs.gnu.org,
 handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 15:45:08 +0200
On Thu, May 16, 2019 at 04:28:10PM +0300, Eli Zaretskii wrote:
> > From: Behdad Esfahbod <behdad <at> behdad.org>
> >
> > Also, if you don't pass neighboring context text to HarfBuzz, again,
> > nothing we can do.
> 
> I believe this is about the other part: displaying text which is
> partially selected, when selection is shown as a different background
> color.  You are saying that to do its job, a shaping engine needs to
> see the entire text, not just the part which has the same colors.
> Right?

There are two kinds of formatting changes, that involve a different font
and that don’t.

For changes that involve a different font (using a new typeface, making
text bold or italic etc.) each part of the text on the sides of the
change has to be shaped separately, but if enough context is given then
HarfBuzz can at least do basic Arabic shaping (selecting the right form
for the context).

Changes that does not involve a different font like color, underline,
background etc. should not break the shaping at all and the text should
be shaped together. The formatting information should be kept along the
text and be applied after shaping i.e. if characters N to N+5 are red,
then after shaping glyphs belonging to characters N to N+5 should be
drawn red.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 14:10:02 GMT) Full text and rfc822 format available.

Message #23 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: contovob <at> tcd.ie, behdad <at> behdad.org, 35721 <at> debbugs.gnu.org, handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 17:08:47 +0300
> Date: Thu, 16 May 2019 15:45:08 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: Behdad Esfahbod <behdad <at> behdad.org>, contovob <at> tcd.ie, handa <at> gnu.org,
> 	35721 <at> debbugs.gnu.org
> 
> > I believe this is about the other part: displaying text which is
> > partially selected, when selection is shown as a different background
> > color.  You are saying that to do its job, a shaping engine needs to
> > see the entire text, not just the part which has the same colors.
> > Right?
> 
> There are two kinds of formatting changes, that involve a different font
> and that don’t.
> 
> For changes that involve a different font (using a new typeface, making
> text bold or italic etc.) each part of the text on the sides of the
> change has to be shaped separately, but if enough context is given then
> HarfBuzz can at least do basic Arabic shaping (selecting the right form
> for the context).
> 
> Changes that does not involve a different font like color, underline,
> background etc. should not break the shaping at all and the text should
> be shaped together. The formatting information should be kept along the
> text and be applied after shaping i.e. if characters N to N+5 are red,
> then after shaping glyphs belonging to characters N to N+5 should be
> drawn red.

Right, thanks.

The main point of this bug report is about the other part, though: how
sheen should be displayed when using DejaVu Sans Mono.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 14:21:02 GMT) Full text and rfc822 format available.

Message #26 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: contovob <at> tcd.ie, behdad <at> behdad.org, 35721 <at> debbugs.gnu.org, handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 16:20:00 +0200
On Thu, May 16, 2019 at 05:08:47PM +0300, Eli Zaretskii wrote:
> The main point of this bug report is about the other part, though: how
> sheen should be displayed when using DejaVu Sans Mono.

I don’t think there is a right answer here. Glyphs can have -ve right or
left side bearings, so if the graphics context has no room on the either
side then I think it is expected that the glyph would be truncated. This
is not Arabic specific (e.g. italic f often have -ve right side
bearing, in variable width fonts at least).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 14:38:02 GMT) Full text and rfc822 format available.

Message #29 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: contovob <at> tcd.ie, behdad <at> behdad.org, 35721 <at> debbugs.gnu.org, handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 17:36:56 +0300
tags 35721 notabug
thanks

> Date: Thu, 16 May 2019 16:20:00 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: behdad <at> behdad.org, contovob <at> tcd.ie, handa <at> gnu.org,
> 	35721 <at> debbugs.gnu.org
> 
> On Thu, May 16, 2019 at 05:08:47PM +0300, Eli Zaretskii wrote:
> > The main point of this bug report is about the other part, though: how
> > sheen should be displayed when using DejaVu Sans Mono.
> 
> I don’t think there is a right answer here. Glyphs can have -ve right or
> left side bearings, so if the graphics context has no room on the either
> side then I think it is expected that the glyph would be truncated. This
> is not Arabic specific (e.g. italic f often have -ve right side
> bearing, in variable width fonts at least).

OK, thanks.

Basil, is it okay with you to close this bug, given these comments?




Added tag(s) notabug. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 16 May 2019 14:38:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 20:48:02 GMT) Full text and rfc822 format available.

Message #34 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Behdad Esfahbod <behdad <at> behdad.org>, 35721 <at> debbugs.gnu.org,
 Kenichi Handa <handa <at> gnu.org>
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 21:47:04 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Date: Mon, 13 May 2019 23:09:06 +0100
>> 
>> I see the following on the master, harfbuzz, and emacs-26 branches
>> (precise versions follow my signature), but I'm not sure how much of
>> this is expected or due to e.g. my font.
>> 
>> 0. emacs -Q
>> 1. C-x 8 RET 0634 RET
>> 
>> The "tail" of the sheen is truncated by the fringe:
>
> After looking at the code and thinking about this, I think this is a
> feature (as strange as it may sound); see below for why I think so.
> And yes, it definitely depends on the font, in this case DejaVu Sans
> Mono.  I don't see this with any other fixed-pitch font I have.
>
> I'm not 100% sure I'm right here, so I CC Behdad and Kenichi in the
> hope that they will comment on this.  Behdad, I see a very similar
> issue with hb-view, when it renders this character using DejaVu Sans
> Mono, so it isn't just an Emacs issue (and seeing what hb-view
> produces actually made me think my opinion is correct about this).
>
>> 3. SPC
>> 
>> The "tail" of the sheen becomes visible, but falls outside of the box
>> cursor:
>
> Yes, this particular font's glyph for sheen has a negative value of
> left bearing.  Which AFAIU means it extends beyond the box dimensions
> to the left.

OK.

>> 4. C-x 8 RET 0643 RET
>> 
>> The kaf is correctly shaped in its initial form:
>> 
>> 5. C-SPC
>> 
>> The kaf changes to its isolated form:
>
> This is different problem, related to how we redraw portions of the
> buffer inside the region (more generally, those which have colors
> different from the default face).
>
> The problem is that we only pass to the shaping engine stretches of
> text that have the same face.  The basic reason for that is that a
> different face can use a different font, and we can only handle
> character composition for characters supported by the same font.
> Another fundamental reason is that the display engine processes text
> in chunks that have the same face.  So when the active region, or some
> other Emacs feature, paints portions of text in some non-default face,
> we redraw the display, and pass to the shaping engine only the portion
> that has that different face.  If that portion is a single character,
> you will see that it loses its correct shape and is rendered in its
> isolated form.  And if the colors change between two characters that
> need to be shaped together, the shaping will break.
>
> You can easily see this effect if you display HELLO, and then
> shift-select portions of the Arabic greeting (or any other script that
> is a heavy user of character compositions).
>
> To fix this, we need some mechanism that will pass larger chunks of
> text to the shaper in these cases, which will need some changes in how
> the display engine iterates through buffer/string text when it
> prepares them for display: we currently stop at every change of face.

Makes sense, thanks for explaining.

> Patches to fix this are most welcome.

I don't think I need to tell you not to hold your breath.
Maybe one day...

>> I occasionally see this happen even without typing anything, as if by a
>> timer, but I'm not sure how to reproduce it.  I think, without being
>> 100% certain, that it's only happened while using the 'arabic' input
>> method.
>
> Maybe, but given my description above, I'm not surprised, because it's
> enough that Emacs decides, for some reason, to redraw just that one
> character.

Your description above explains the case where the mark is activated in
the middle of a composition, but I don't think it explains why inserting
characters further down the buffer would affect compositions on previous
lines, as in steps 9 and 10 in the OP, where no face change is involved.

> Now to the original problem.  Let me turn the table and ask you: what
> did you expect to happen instead?  This is a fixed-pitch font, so how
> can Emacs display a character that extends to the left from its box,
> at the left-most window coordinate?  It has no choice but consider its
> extension be off-screen, as if the window was hscrolled.
>
> The "normal" case for this character is to be part of R2L text, which
> begins at the right window margin, and flows to the left.  In that
> case, the extension will overlap the character cell of the next (in
> the logical order) character.

Right.  I only noticed the truncation while writing up the rest of the
report, but it seemed relevant enough to mention.  Following your
explanation, the current truncating behaviour seems reasonable to me.

> So I think we have no bug here, we behave as expected.  If not, I'm
> sure Behdad and Kenichi will correct me.

Thanks,

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Thu, 16 May 2019 20:55:02 GMT) Full text and rfc822 format available.

Message #37 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Khaled Hosny <dr.khaled.hosny <at> gmail.com>, behdad <at> behdad.org,
 35721 <at> debbugs.gnu.org, handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Thu, 16 May 2019 21:54:10 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> tags 35721 notabug
> thanks
>
>> Date: Thu, 16 May 2019 16:20:00 +0200
>> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
>> Cc: behdad <at> behdad.org, contovob <at> tcd.ie, handa <at> gnu.org,
>> 	35721 <at> debbugs.gnu.org
>> 
>> On Thu, May 16, 2019 at 05:08:47PM +0300, Eli Zaretskii wrote:
>> > The main point of this bug report is about the other part, though: how
>> > sheen should be displayed when using DejaVu Sans Mono.
>> 
>> I don’t think there is a right answer here. Glyphs can have -ve right or
>> left side bearings, so if the graphics context has no room on the either
>> side then I think it is expected that the glyph would be truncated. This
>> is not Arabic specific (e.g. italic f often have -ve right side
>> bearing, in variable width fonts at least).
>
> OK, thanks.
>
> Basil, is it okay with you to close this bug, given these comments?

Thanks to everyone for your explanations.  I now understand why the
truncation happens, and why mark activation is expected to affect
character composition.

The issue that prompted this report, however, is the alternating
toggling of certain character compositions while typing in a separate
part of the buffer (steps 9 and 10 in the OP), where no face change is
involved.  I'm sorry for not making this clearer from the outset.

If you think this is unsurprising behaviour given the display engine's
current implementation, I don't mind if you close this issue.
Otherwise, perhaps either this issue can be retitled, or I can submit a
new issue focussing only on that part of the OP.

Sorry for the confusion, and thanks again.

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Fri, 17 May 2019 06:32:02 GMT) Full text and rfc822 format available.

Message #40 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 35721 <at> debbugs.gnu.org,
 handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Fri, 17 May 2019 09:31:18 +0300
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: Khaled Hosny <dr.khaled.hosny <at> gmail.com>,  <behdad <at> behdad.org>,  <handa <at> gnu.org>,  <35721 <at> debbugs.gnu.org>
> Date: Thu, 16 May 2019 21:54:10 +0100
> 
> The issue that prompted this report, however, is the alternating
> toggling of certain character compositions while typing in a separate
> part of the buffer (steps 9 and 10 in the OP), where no face change is
> involved.  I'm sorry for not making this clearer from the outset.
> 
> If you think this is unsurprising behaviour given the display engine's
> current implementation, I don't mind if you close this issue.

I'm quite sure it is the result of how the display of complex scripts
is implemented.  However, ...

> Otherwise, perhaps either this issue can be retitled, or I can submit a
> new issue focussing only on that part of the OP.

... I think it would be best to file a new bug report, which is _only_
about the above-mentioned alternations in display of composed
characters, with specific examples only for that issue.  Then these
examples could be analyzed, and either (1) we find some bug that will
then be fixed, or (2) we conclude that these are artifacts of the
current implementation of the display engine, and make this a wishlist
bug report.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35721; Package emacs. (Mon, 20 May 2019 19:09:02 GMT) Full text and rfc822 format available.

Message #43 received at 35721 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 35721 <at> debbugs.gnu.org,
 handa <at> gnu.org
Subject: Re: bug#35721: 27.0.50; Strange Arabic shaping behavior
Date: Mon, 20 May 2019 20:07:55 +0100
close 35721
quit

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: Khaled Hosny <dr.khaled.hosny <at> gmail.com>,  <behdad <at> behdad.org>,  <handa <at> gnu.org>,  <35721 <at> debbugs.gnu.org>
>> Date: Thu, 16 May 2019 21:54:10 +0100
>> 
>> The issue that prompted this report, however, is the alternating
>> toggling of certain character compositions while typing in a separate
>> part of the buffer (steps 9 and 10 in the OP), where no face change is
>> involved.  I'm sorry for not making this clearer from the outset.
>> 
>> If you think this is unsurprising behaviour given the display engine's
>> current implementation, I don't mind if you close this issue.
>
> I'm quite sure it is the result of how the display of complex scripts
> is implemented.  However, ...
>
>> Otherwise, perhaps either this issue can be retitled, or I can submit a
>> new issue focussing only on that part of the OP.
>
> ... I think it would be best to file a new bug report, which is _only_
> about the above-mentioned alternations in display of composed
> characters, with specific examples only for that issue.  Then these
> examples could be analyzed, and either (1) we find some bug that will
> then be fixed, or (2) we conclude that these are artifacts of the
> current implementation of the display engine, and make this a wishlist
> bug report.

Filed as bug#35811[1], so I'm closing this report.

[1]: https://debbugs.gnu.org/35811

Thanks,

-- 
Basil




bug closed, send any further explanations to 35721 <at> debbugs.gnu.org and "Basil L. Contovounesios" <contovob <at> tcd.ie> Request was from "Basil L. Contovounesios" <contovob <at> tcd.ie> to control <at> debbugs.gnu.org. (Mon, 20 May 2019 19:09:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 18 Jun 2019 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 313 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.