GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear

Package: emacs;

Reported by: Steffan <smias <at> yandex.ru>

Date: Wed, 4 Jul 2012 18:43:12 UTC

Severity: normal

Found in version 24.1

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 11860 in the body.
You can then email your comments to 11860 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 04 Jul 2012 18:43:12 GMT) Full text and rfc822 format available.

Acknowledgement sent to Steffan <smias <at> yandex.ru>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 04 Jul 2012 18:43:12 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.1;  Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 04 Jul 2012 11:17:49 +0200

Hello,

the diacritics characters (harakat, short vowels of arabic) don't appear in windows. (In linux it works very fine)

These are the unicode names:
U+064B (1611)	‏ً‎	Arabisches Fathatan	ARABIC FATHATAN
U+064C (1612)	‏ٌ‎	Arabisches Dammatan	ARABIC DAMMATAN
U+064D (1613)	‏ٍ‎	Arabisches Kasratan	ARABIC KASRATAN
U+064E (1614)	‏َ‎	Arabisches Fatha	ARABIC FATHA
U+064F (1615)	‏ُ‎	Arabisches Damma	ARABIC DAMMA
U+0650 (1616)	‏ِ‎	Arabisches Kasra	ARABIC KASRA
U+0651 (1617)	‏ّ‎	Arabisches Schadda	ARABIC SHADDA
U+0652 (1618)	‏ْ‎	Arabisches Sukun	ARABIC SUKUN

All these characters doesn't appear. Or I can see them shortly if the cursor is on one of them.

I've tried many fonts, but it doesn't work. These special characters are in the file, emacs don't loose them. If I copy the text to another editor, I can see them.

Should I download other packages?


Thanks

Greetings

Steffan

 -- 
In GNU Emacs 24.1.1 (i386-mingw-nt6.1.7600)
 of 2012-06-10 on MARVIN
Windowing system distributor `Microsoft Corp.', version 6.1.7600
Configured using:
 `configure --with-gcc (4.6) --cflags
 -ID:/devel/emacs/libs/libXpm-3.5.8/include
 -ID:/devel/emacs/libs/libXpm-3.5.8/src
 -ID:/devel/emacs/libs/libpng-dev_1.4.3-1/include
 -ID:/devel/emacs/libs/zlib-dev_1.2.5-2/include
 -ID:/devel/emacs/libs/giflib-4.1.4-1/include
 -ID:/devel/emacs/libs/jpeg-6b-4/include
 -ID:/devel/emacs/libs/tiff-3.8.2-1/include
 -ID:/devel/emacs/libs/gnutls-3.0.9/include'

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: DEU
  value of $XMODIFIERS: nil
  locale-coding-system: cp1256
  default enable-multibyte-characters: t

Major mode: Fundamental

Minor modes in effect:
  text-scale-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
C-x C-f <up> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> u <backspace> 
d o <tab> w . <backspace> / <backspace> <backspace> 
<backspace> <backspace> d d d <return> X \ <backspace> 
<backspace> C-x C-] C-x C-= C-x C-= g h h f C-x C-= 
C-x C-= C-= C-= C-= <backspace> <backspace> <backspace> 
<backspace> <backspace> <return> h g <backspace> <backspace> 
<backspace> <language-change> ا ل ج ز <backspace> <backspace> 
<backspace> <backspace> ا ل ع ل ا <backspace> <backspace> 
ل ا <backspace> <backspace> ر ب ي و <backspace> ة <right> 
ّ <right> <right> <left> <backspace> <delete> <delete> 
<delete> <delete> <backspace> <backspace> <backspace> 
<backspace> ت ا <backspace> <backspace> <backspace> 
ا ل ع ر ب ي ة <return> <return> ا ل ع َ ر َ ب ي ّ ة <return> 
C-SPC <up> <up> <down> <down> <right> <right> <right> 
<right> <right> <left> <left> <left> <right> <right> 
<right> <right> <right> <right> <right> <down> <left> 
<left> <right> <left> <left> <left> <left> <left> <left> 
<left> C-g C-g C-x C-s <return> <up> <left> <left> 
<left> <right> <left> <left> <left> <left> <left> <right> 
<right> <right> <left> <left> <left> <left> <up> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<up> <up> <up> <down> <return> <down> <down> <right> 
<right> <right> <right> <right> <right> <right> <right> 
<left> <left> <left> <left> <left> <left> <left> <right> 
<right> M-x <up> <down> 0 <backspace> - ل ا ع ل <tab> 
<backspace> <backspace> <backspace> <backspace> <language-change> 
b u <tab> g <tab> ß <backspace> <left> <left> <left> 
<left> r e <tab> <return>

Recent messages:
delete-backward-char: Beginning of buffer
Mark set
Quit [2 times]
Saving file c:/Users/q/ddd...
Wrote c:/Users/q/ddd
call-interactively: End of buffer
goto-history-element: Beginning of history; no preceding item
goto-history-element: End of history; no default available
Making completion list... [2 times]
call-interactively: End of buffer [2 times]

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail regexp-opt rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils help-mode easymenu view face-remap
time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel dos-w32
disp-table ls-lisp w32-win w32-vars tool-bar dnd fontset image fringe
lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer loaddefs button faces cus-face files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process multi-tty emacs)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 04 Jul 2012 20:27:01 GMT) Full text and rfc822 format available.

Message #8 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: 11860 <at> debbugs.gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 04 Jul 2012 23:22:00 +0300

> From: Steffan <smias <at> yandex.ru>
> Date: Wed, 04 Jul 2012 11:17:49 +0200
> 
> Hello,
> 
> the diacritics characters (harakat, short vowels of arabic) don't appear in windows. (In linux it works very fine)
> 
> These are the unicode names:
> U+064B (1611)	‏ً‎	Arabisches Fathatan	ARABIC FATHATAN
> U+064C (1612)	‏ٌ‎	Arabisches Dammatan	ARABIC DAMMATAN
> U+064D (1613)	‏ٍ‎	Arabisches Kasratan	ARABIC KASRATAN
> U+064E (1614)	‏َ‎	Arabisches Fatha	ARABIC FATHA
> U+064F (1615)	‏ُ‎	Arabisches Damma	ARABIC DAMMA
> U+0650 (1616)	‏ِ‎	Arabisches Kasra	ARABIC KASRA
> U+0651 (1617)	‏ّ‎	Arabisches Schadda	ARABIC SHADDA
> U+0652 (1618)	‏ْ‎	Arabisches Sukun	ARABIC SUKUN
> 
> All these characters doesn't appear. Or I can see them shortly if the cursor is on one of them.

Please show an example of text where you'd like them to show.  (I
don't speak Arabic.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 05 Jul 2012 17:21:01 GMT) Full text and rfc822 format available.

Message #11 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>, Kenichi Handa <handa <at> m17n.org>
Cc: 11860 <at> debbugs.gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 05 Jul 2012 20:16:05 +0300

[Please always CC the bug address, so that all this information gets
recorded by the bug tracker.]

> From: Steffan <smias <at> yandex.ru>
> Date: Thu, 05 Jul 2012 17:39:55 +0200
> 
> When I chose the input-method "arabic" and then type only the TWO characters "u" then "X" (capital Letter) I get
> - for 'u' : as expected "Ayin" (it looks like a mirror-inverted "3")
> - and for 'X' : I get nothing! It should appear the diacritic sign "Sukun" above the letter "Ayin". ("Sukun" looks like a small circle.)
 
Yes, I see this also.  Strange, it sounds like we compose these two
characters, but the resulting grapheme cluster is incorrect.  I hope
Handa-san (CC'ed) could look into this.

> THE BUG SIMILAR IN HEBREW:
> 
> It's strange:
> 
> I choose "hebrew-full" as input-method.
> 
> - After typing 'f' I get KAF
> - then by typing d I get GIMMEL
> - and after typing 'D' I get "the three point sign" (HEBREW POINT QUBUTS) not below the GIMMEL but the KAF!
> - and If I then type anything else (like DALET) the three points disappear!

This I don't see on my Windows machine.  Hebrew is displayed correctly
for me.  This is in 'emacs -Q", right?  If not, please try in "emacs -Q".

> I Linux I installed some additional packages two display arabic correctly. Maybe I have to install packages for windows - but wich? 

No additional packages are needed for Windows, the Uniscribe shaping
engine, which Emacs uses on Windows, supports everything.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 05 Jul 2012 17:59:02 GMT) Full text and rfc822 format available.

Message #14 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, Kenichi Handa <handa <at> m17n.org>
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 05 Jul 2012 19:53:58 +0200


> [Please always CC the bug address, so that all this information gets
> recorded by the bug tracker.]
> 
>> From: Steffan <smias <at> yandex.ru>
>> Date: Thu, 05 Jul 2012 17:39:55 +0200
>>
>> When I chose the input-method "arabic" and then type only the TWO characters "u" then "X" (capital Letter) I get
>> - for 'u' : as expected "Ayin" (it looks like a mirror-inverted "3")
>> - and for 'X' : I get nothing! It should appear the diacritic sign "Sukun" above the letter "Ayin". ("Sukun" looks like a small circle.)
> 
> Yes, I see this also. Strange, it sounds like we compose these two
> characters, but the resulting grapheme cluster is incorrect. I hope
> Handa-san (CC'ed) could look into this.

The bug seems to bee (bigger ? smaller?) : The "Harakat" (diacritcs) don't appear at all. If you type 'X' ("Sukun" the small circle) in the (in the arabic mode) you get a strange sign (that indicates that emacs can't display the character), the same thing by the other "Harakat" ('Q','W','E','R','A','S','X' and  '`' (this is for ARABIC SHADDA)).

>> THE BUG SIMILAR IN HEBREW:
>>
>> It's strange:
>>
>> I choose "hebrew-full" as input-method.
>>
>> - After typing 'f' I get KAF
>> - then by typing d I get GIMMEL
>> - and after typing 'D' I get "the three point sign" (HEBREW POINT QUBUTS) not below the GIMMEL but the KAF!
>> - and If I then type anything else (like DALET) the three points disappear!
> 
> This I don't see on my Windows machine. Hebrew is displayed correctly
> for me. This is in 'emacs -Q", right? If not, please try in "emacs -Q".
I've tried it with "emacs -q" : nothing has changed in the two bugs (hebrew and arabic).

> 
>> I Linux I installed some additional packages two display arabic correctly. Maybe I have to install packages for windows - but wich?
> 
> No additional packages are needed for Windows, the Uniscribe shaping
> engine, which Emacs uses on Windows, supports everything.


 --

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 05 Aug 2012 05:36:01 GMT) Full text and rfc822 format available.

Message #17 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, Kenichi Handa <handa <at> m17n.org>
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 05 Aug 2012 07:27:49 +0200

Hallo again.

I just wanted to ask you about the bug. Are there any news?

Thanks

> [Please always CC the bug address, so that all this information gets
> recorded by the bug tracker.]
> 
>> From: Steffan <smias <at> yandex.ru>
>> Date: Thu, 05 Jul 2012 17:39:55 +0200
>>
>> When I chose the input-method "arabic" and then type only the TWO characters "u" then "X" (capital Letter) I get
>> - for 'u' : as expected "Ayin" (it looks like a mirror-inverted "3")
>> - and for 'X' : I get nothing! It should appear the diacritic sign "Sukun" above the letter "Ayin". ("Sukun" looks like a small circle.)
> 
> Yes, I see this also. Strange, it sounds like we compose these two
> characters, but the resulting grapheme cluster is incorrect. I hope
> Handa-san (CC'ed) could look into this.
> 
>> THE BUG SIMILAR IN HEBREW:
>>
>> It's strange:
>>
>> I choose "hebrew-full" as input-method.
>>
>> - After typing 'f' I get KAF
>> - then by typing d I get GIMMEL
>> - and after typing 'D' I get "the three point sign" (HEBREW POINT QUBUTS) not below the GIMMEL but the KAF!
>> - and If I then type anything else (like DALET) the three points disappear!
> 
> This I don't see on my Windows machine. Hebrew is displayed correctly
> for me. This is in 'emacs -Q", right? If not, please try in "emacs -Q".
> 
>> I Linux I installed some additional packages two display arabic correctly. Maybe I have to install packages for windows - but wich?
> 
> No additional packages are needed for Windows, the Uniscribe shaping
> engine, which Emacs uses on Windows, supports everything.


 --

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 05 Aug 2012 15:58:01 GMT) Full text and rfc822 format available.

Message #20 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: 11860 <at> debbugs.gnu.org, handa <at> m17n.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 05 Aug 2012 18:49:56 +0300

> From: Steffan <smias <at> yandex.ru>
> Cc: 11860 <at> debbugs.gnu.org,Kenichi Handa <handa <at> m17n.org>
> Date: Sun, 05 Aug 2012 07:27:49 +0200
> 
> Hallo again.
> 
> I just wanted to ask you about the bug. Are there any news?

No, sorry.  We are still waiting for Handa-san to look at this.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 13 Aug 2012 00:11:01 GMT) Full text and rfc822 format available.

Message #23 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa.kenichi <at> aist.go.jp>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 13 Aug 2012 09:02:06 +0900

In article <834nohbaor.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > I just wanted to ask you about the bug. Are there any news?

> No, sorry.  We are still waiting for Handa-san to look at this.

I'm very sorry for the late response.  I was just back from
Europe.  I'll start investigating this problem soon.

---
Kenichi Handa
handa <at> m17n.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sat, 18 Aug 2012 02:46:02 GMT) Full text and rfc822 format available.

Message #26 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Kenichi Handa <handa.kenichi <at> aist.go.jp>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 18 Aug 2012 11:45:27 +0900

In article <tl7628nslq9.fsf <at> m17n.org>, Kenichi Handa <handa.kenichi <at> aist.go.jp> writes:

> I'm very sorry for the late response.  I was just back from
> Europe.  I'll start investigating this problem soon.

I first confirmed that the described problems of Arabic and
Hebrew occur with Emacs running on Windows.  Typing C-u C-x
= on the first Arabic character (U+0639) showed that "Courier
New" font is used for it, and showed this composition
information.

Composed with the following character(s) "ْ" using this font:
  uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
by these glyphs:
  [0 1 1593 969 8 1 8 12 4 nil]
  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

Next, I used the same "Courier New" font on GNU/Linux, and
specified it for Arabic as this with Emacs running on
GNU/Linux:
  (set-fontset-font t 'arabic '("courier new"  . "unicode-bmp"))
With this setting, Emacs correctly displayed Arabic, and
typing C-u C-x = on U+0639 showed this composition
information.

Composed with the following character(s) "ْ" using this font:
  xft:-monotype-Courier New-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 1593 969 8 2 8 4 4 nil]
  [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]

Each vector is a GLYPH described in the docstring of
composition-get-gstring as this:
----------------------------------------------------------------------
GLYPH is a vector whose elements have this form:
    [ FROM-IDX TO-IDX C CODE WIDTH LBEARING RBEARING ASCENT DESCENT
      [ [X-OFF Y-OFF WADJUST] | nil] ]
where
    FROM-IDX and TO-IDX are used internally and should not be touched.
    C is the character of the glyph.
    CODE is the glyph-code of C in FONT-OBJECT.
    WIDTH thru DESCENT are the metrics (in pixels) of the glyph.
    X-OFF and Y-OFF are offsets to the base position for the glyph.
    WADJUST is the adjustment to the normal width of the glyph.
----------------------------------------------------------------------

So, apparently Emacs on Windows and GNU/Linux uses the
different metrics of glyphs.  As the shaper on GNU/Linux
(m17n-lib library) works correctly for the same font, and
the other applications on Windows have no problem, I suspect
that the problem is in Emacs' interface with uniscribe
(w32font.c or w32uniscribe.c).

If this problem happens only for bidi scripts, one
possibility is that Emacs's rendering engine (xdisp.c)
expects glyphs in a glyph-string are rendered in that order
from left to right, but the returned glyph-string on Windows
should be rendered in reverse order.  For instance, in the
above case, we may have to render glyphs in this order
(diacritical mark first):

  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
  [0 1 1593 969 8 1 8 12 4 nil]

I think the further debugging must be done by those who
knows uniscribe, w32font.c, and w32uniscribe.c.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sat, 18 Aug 2012 07:15:01 GMT) Full text and rfc822 format available.

Message #29 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 18 Aug 2012 10:14:11 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> Date: Sat, 18 Aug 2012 11:45:27 +0900
> 
> So, apparently Emacs on Windows and GNU/Linux uses the
> different metrics of glyphs.  As the shaper on GNU/Linux
> (m17n-lib library) works correctly for the same font, and
> the other applications on Windows have no problem, I suspect
> that the problem is in Emacs' interface with uniscribe
> (w32font.c or w32uniscribe.c).
> 
> If this problem happens only for bidi scripts, one
> possibility is that Emacs's rendering engine (xdisp.c)
> expects glyphs in a glyph-string are rendered in that order
> from left to right, but the returned glyph-string on Windows
> should be rendered in reverse order.

If this is the case, how come we display the diacriticals correctly on
Windows in other cases, e.g. with Hebrew?

> For instance, in the above case, we may have to render glyphs in
> this order (diacritical mark first):
> 
>   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>   [0 1 1593 969 8 1 8 12 4 nil]

Could you propose a patch to try this isea?

> I think the further debugging must be done by those who
> knows uniscribe, w32font.c, and w32uniscribe.c.

Alas, I don't think we have such people on board, not with high enough
availability, anyway.  If you could kindly suggest where to look, what
variables to display, etc., I could try doing that, and reporting the
results.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sat, 18 Aug 2012 09:20:02 GMT) Full text and rfc822 format available.

Message #32 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 18 Aug 2012 18:19:19 +0900

In article <83txw0aczg.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > From: Kenichi Handa <handa <at> gnu.org>
> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> > Date: Sat, 18 Aug 2012 11:45:27 +0900
> > 
> > So, apparently Emacs on Windows and GNU/Linux uses the
> > different metrics of glyphs.  As the shaper on GNU/Linux
> > (m17n-lib library) works correctly for the same font, and
> > the other applications on Windows have no problem, I suspect
> > that the problem is in Emacs' interface with uniscribe
> > (w32font.c or w32uniscribe.c).
> > 
> > If this problem happens only for bidi scripts, one
> > possibility is that Emacs's rendering engine (xdisp.c)
> > expects glyphs in a glyph-string are rendered in that order
> > from left to right, but the returned glyph-string on Windows
> > should be rendered in reverse order.

> If this is the case, how come we display the diacriticals correctly on
> Windows in other cases, e.g. with Hebrew?

For Hebrew too, on Windows, I see the same problem as what
Steffan <smias <at> yandex.ru> reported:

In article <349641344144469 <at> web8d.yandex.ru>, Steffan <smias <at> yandex.ru> writes:
>>> I choose "hebrew-full" as input-method.
>>> 
>>> - After typing 'f' I get KAF
>>> - then by typing d I get GIMMEL
>>> - and after typing 'D' I get "the three point sign" (HEBREW POINT QUBUTS) not below the GIMMEL but the KAF!

If you don't face with that problem, perhaps we are using
the different font.  C-u C-x = tells that "courier new" is
used for hebrew too in my case.

> > For instance, in the above case, we may have to render glyphs in
> > this order (diacritical mark first):
> > 
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> >   [0 1 1593 969 8 1 8 12 4 nil]

> Could you propose a patch to try this isea?

I have no idea.  :-(

> > I think the further debugging must be done by those who
> > knows uniscribe, w32font.c, and w32uniscribe.c.

> Alas, I don't think we have such people on board, not with high enough
> availability, anyway.  If you could kindly suggest where to look, what
> variables to display, etc., I could try doing that, and reporting the
> results.

I've just read the function uniscribe_shape in
w32uniscribe.c.  It seems that these are the key API for
uniscribe:

* ScriptItemize -- no idea what is this
* ScriptShape -- perhaps for glyph substitution (GSUB features of opentype)
* ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype)

So at first please check the documentation of ScriptShape
and figure out how it works for bidi script; i.e. what order
does it expect for input, and what order does it produce.

Next please find the meaning of this code fragment:

		  /* Detect clusters, for linking codes back to
		     characters.  */
		  if (attributes[j].fClusterStart)
		    {
		      while (from < nchars_in_run && clusters[from] < j)
			from++;
		      if (from >= nchars_in_run)
			from = to = nchars_in_run - 1;
		      else
			{
			  int k;
			  to = nchars_in_run - 1;
			  for (k = from + 1; k < nchars_in_run; k++)
			    {
			      if (clusters[k] > j)
				{
				  to = k - 1;
				  break;
				}
			    }
			}
		    }

The comment refer to "clusters".  I don't know what it
exactly means in uniscribe, but I guess it relates to
grapheme cluster, and if so, this part seems to relates to
the ordering of glyphs in this kind of grapheme clauster:

  [0 1 1593 969 8 1 8 12 4 nil]
  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sat, 18 Aug 2012 15:34:02 GMT) Full text and rfc822 format available.

Message #35 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 18 Aug 2012 18:33:21 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, handa <at> gnu.org
> Date: Sat, 18 Aug 2012 18:19:19 +0900
> 
> > If this is the case, how come we display the diacriticals correctly on
> > Windows in other cases, e.g. with Hebrew?
> 
> For Hebrew too, on Windows, I see the same problem as what
> Steffan <smias <at> yandex.ru> reported:
> 
> In article <349641344144469 <at> web8d.yandex.ru>, Steffan <smias <at> yandex.ru> writes:
> >>> I choose "hebrew-full" as input-method.
> >>> 
> >>> - After typing 'f' I get KAF
> >>> - then by typing d I get GIMMEL
> >>> - and after typing 'D' I get "the three point sign" (HEBREW POINT QUBUTS) not below the GIMMEL but the KAF!
> 
> If you don't face with that problem, perhaps we are using
> the different font.  C-u C-x = tells that "courier new" is
> used for hebrew too in my case.

"Courier New" is the font that is used, and I still don't see the
problem.  The HEBREW POINT QUBUTS is displayed below GIMEL, as I'd
expect.

> I've just read the function uniscribe_shape in
> w32uniscribe.c.  It seems that these are the key API for
> uniscribe:
> 
> * ScriptItemize -- no idea what is this

It breaks the string to be displayed into individually shapeable
chunks, called "items".  We then pass each chunk to Uniscribe
separately for shaping.

> * ScriptShape -- perhaps for glyph substitution (GSUB features of opentype)

http://msdn.microsoft.com/en-us/library/windows/desktop/dd368564%28v=vs.85%29.aspx
says that this function "Generates glyphs and visual attributes for a
Unicode run".

> * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype)
> 
> So at first please check the documentation of ScriptShape
> and figure out how it works for bidi script; i.e. what order
> does it expect for input, and what order does it produce.

From the above page:

  If fLogicalOrder is set to TRUE in the SCRIPT_ANALYSIS structure, the
  function always generates glyphs in the same order as the original
  Unicode characters. If fLogicalOrder is set to FALSE, the function
  generates right-to-left items in reverse order so that ScriptTextOut
  does not have to reverse them before calling ExtTextOut.

And w32uniscribe.c sets that flag to TRUE a few lines before it calls
ScriptShape, because Emacs itself reorders characters:

  for (i = 0; i < nitems; i++)
    {
      int nglyphs, nchars_in_run;
      nchars_in_run = items[i+1].iCharPos - items[i].iCharPos;
      /* Force ScriptShape to generate glyphs in the same order as
	 they are in the input LGSTRING, which is in the logical
	 order.  */
      items[i].a.fLogicalOrder = 1;  <<<<<<<<<<<<<<<<<<<<<<<<

      /* Context may be NULL here, in which case the cache should be
         used without needing to select the font.  */
      result = ScriptShape (context, &(uniscribe_font->cache),
			    chars + items[i].iCharPos, nchars_in_run,
			    max_glyphs - done_glyphs, &(items[i].a),
			    glyphs, clusters, attributes, &nglyphs);

> Next please find the meaning of this code fragment:
> 
> 		  /* Detect clusters, for linking codes back to
> 		     characters.  */
> 		  if (attributes[j].fClusterStart)
> 		    {
> 		      while (from < nchars_in_run && clusters[from] < j)
> 			from++;
> 		      if (from >= nchars_in_run)
> 			from = to = nchars_in_run - 1;
> 		      else
> 			{
> 			  int k;
> 			  to = nchars_in_run - 1;
> 			  for (k = from + 1; k < nchars_in_run; k++)
> 			    {
> 			      if (clusters[k] > j)
> 				{
> 				  to = k - 1;
> 				  break;
> 				}
> 			    }
> 			}
> 		    }
> 
> The comment refer to "clusters".  I don't know what it
> exactly means in uniscribe, but I guess it relates to
> grapheme cluster, and if so, this part seems to relates to
> the ordering of glyphs in this kind of grapheme clauster:
> 
>   [0 1 1593 969 8 1 8 12 4 nil]
>   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

No, they are character clusters, not grapheme clusters.  They could be
similar (or even identical) to grapheme clusters, but I'm not sure,
because I have a very vague idea about both.  You can find some
details here:

   http://msdn.microsoft.com/en-us/library/windows/desktop/dd317792%28v=vs.85%29.aspx

I hope this will allow you to understand the meaning of the above
code, by looking at how the results are used in the calls to
LGLYPH_SET_* macros right below the above snippet.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 03:04:02 GMT) Full text and rfc822 format available.

Message #38 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 11:02:52 +0800

Kenichi Handa <handa <at> gnu.org> writes:

> In article <83txw0aczg.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
>
>> > From: Kenichi Handa <handa <at> gnu.org>
>> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
>> > Date: Sat, 18 Aug 2012 11:45:27 +0900
>> > 
>> > So, apparently Emacs on Windows and GNU/Linux uses the
>> > different metrics of glyphs.

Right, but adding the offsets to the corresponding metrics, we get the
same result with both the Windows and GNU/Linux cases, except for the
total height of the font, which I think is because Windows counts
inter-line spacing, while on GNU/Linux, that is separate.

So I'm not sure that this is causing us problems (see Eli's report about
Hebrew), it's just a case of a different reference point being used
between Windows and GNU/Linux.

> For Hebrew too, on Windows, I see the same problem as what
> Steffan <smias <at> yandex.ru> reported:

If you are seeing something different than Eli for Hebrew with the same
font, then I suspect the cause is linked with the version of Uniscribe
that is installed. Maybe diacritic handling for Hebrew and Arabic is a
more recent addition to Uniscribe than the basic support for those
languages.

>> > For instance, in the above case, we may have to render glyphs in
>> > this order (diacritical mark first):
>> > 
>> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>> >   [0 1 1593 969 8 1 8 12 4 nil]

I'm curious as to how we ended up with the same C entry in those
vectors.  Could this be causing us problems later on?  The glyph index
is correct (comparing to the GNU/Linux version), but I wonder if
Uniscribe is referring back to the character at some point and tripping
up because it has been changed.

> I've just read the function uniscribe_shape in
> w32uniscribe.c.  It seems that these are the key API for
> uniscribe:
>
> * ScriptItemize -- no idea what is this

This should be a no-op on Emacs, as we already split the string into
LGSTRING components. But if it is not called, subsequent uniscribe
operations fail, so it must also be doing some initialization of
internal structures as well.

> * ScriptShape -- perhaps for glyph substitution (GSUB features of opentype)
> * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype)

Yes, I think that is correct.

> So at first please check the documentation of ScriptShape
> and figure out how it works for bidi script; i.e. what order
> does it expect for input, and what order does it produce.
>
> Next please find the meaning of this code fragment:
>
> 		  /* Detect clusters, for linking codes back to
> 		     characters.  */
> 		  if (attributes[j].fClusterStart)
> 		    {
> 		      while (from < nchars_in_run && clusters[from] < j)
> 			from++;
> 		      if (from >= nchars_in_run)
> 			from = to = nchars_in_run - 1;
> 		      else
> 			{
> 			  int k;
> 			  to = nchars_in_run - 1;
> 			  for (k = from + 1; k < nchars_in_run; k++)
> 			    {
> 			      if (clusters[k] > j)
> 				{
> 				  to = k - 1;
> 				  break;
> 				}
> 			    }
> 			}
> 		    }
>
> The comment refer to "clusters".  I don't know what it
> exactly means in uniscribe, but I guess it relates to
> grapheme cluster, and if so, this part seems to relates to
> the ordering of glyphs in this kind of grapheme clauster:
>
>   [0 1 1593 969 8 1 8 12 4 nil]
>   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

That seems to be correct.  Maybe this is the code that is changing the
character code to 1593.  I seem to recall that something like this was
required for Indic languages to let Emacs know which characters had been
linked back into one glyph.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 04:35:01 GMT) Full text and rfc822 format available.

Message #41 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 13:34:36 +0900

>>>>> On Sat, 18 Aug 2012 11:45:27 +0900, Kenichi Handa <handa <at> gnu.org> said:

> If this problem happens only for bidi scripts, one
> possibility is that Emacs's rendering engine (xdisp.c)
> expects glyphs in a glyph-string are rendered in that order
> from left to right, but the returned glyph-string on Windows
> should be rendered in reverse order.  For instance, in the
> above case, we may have to render glyphs in this order
> (diacritical mark first):

>   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>   [0 1 1593 969 8 1 8 12 4 nil]

The font backend driver on the Mac port is supposed to support
right-to-left shaping (including for non-BMP chars, though I don't
have a good example), and it gives the following result (diacritical
mark comes first) for Courier New 13pt:

  mac-ct:-*-Courier New-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 1618 760 8 0 2 11 -8 [-1 2 1]]
  [0 1 1593 969 8 0 6 5 4 [-1 0 8]]

In the above example, the grapheme cluster consists of glyphs having
non-nil adjustments (the last element of each vector).  In the
function Ffont_shape_gstring, there is some code that merges grapheme
clusters generated by a font backend driver so each of them starts
with a glyph having non-nil adjustment (except the first grapheme
cluster of the gstring).  I think this is not correct especially for
right-to-left text, and I disabled that part in the Mac port.  Could
you give an example if you think this part is necessary?

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 07:33:01 GMT) Full text and rfc822 format available.

Message #44 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 16:32:38 +0900

>>>>> On Sat, 18 Aug 2012 18:33:21 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

> And w32uniscribe.c sets that flag to TRUE a few lines before it calls
> ScriptShape, because Emacs itself reorders characters:

>   for (i = 0; i < nitems; i++)
>     {
>       int nglyphs, nchars_in_run;
>       nchars_in_run = items[i+1].iCharPos - items[i].iCharPos;
>       /* Force ScriptShape to generate glyphs in the same order as
> 	 they are in the input LGSTRING, which is in the logical
> 	 order.  */
>       items[i].a.fLogicalOrder = 1;  <<<<<<<<<<<<<<<<<<<<<<<<

IIUC, the resulting gstring should be ordered as:

  * in the logical order between the grapheme clusters because Emacs
    itself reorders them.
  * in the physical order between the glyphs inside a single grapheme
    cluster because drawing and metric calculation routines for a
    grapheme cluster do not know about the direction.

The font backend driver on the Mac port is implemented as above, and
seems to work correctly.  The APIs used for shaping generates glyphs
either in the physical order (Core Text) or in the logical order
(NSLayoutManager), so I had to reorder the information about the
generated glyphs locally by maintaining a permutation on the glyph
indices.  You can look at the variable `permutation' in the function
`mac_ctfont_shape' (for Core Text, in src/macfont.c) or the function
`mac_font_shape_1' (For NSLayoutManager, in src/macappkit.m) in the
source of the Mac port.

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 12:53:02 GMT) Full text and rfc822 format available.

Message #47 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 21:51:44 +0900

In article <wlvcgfuyjt.wl%mituharu <at> math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp> writes:

> IIUC, the resulting gstring should be ordered as:

>   * in the logical order between the grapheme clusters because Emacs
>     itself reorders them.

>   * in the physical order between the glyphs inside a single grapheme
>     cluster because drawing and metric calculation routines for a
>     grapheme cluster do not know about the direction.

Almost right.  We can't tell that the order in a grapheme
cluster is in physical or logical; it's just in "drawing"
order.

For instance, when we have an Arabic text "AbCdEf"
(consonants:uppercase, vowels:lowercase), the gstring that
should be returned by a shaper should be "XYZ" (where X is
actually a grapheme cluster for "Ab", Y for "Cd", Z for
"Ef").  Here how glyphs in each cluster should be ordered
depends on metrics of glyphs.  If a glyph for "A" has
positive xadvance, and a glyph for "b" has negative lbearing
(and zero xadvance), the order should be "Ab" because the
display engine should draw A and b in that order to make b
aligned on/under A.  But, if a glyph for "b" has
non-negative lbearing (and zero xadvance), the order should
be "bA".

> The font backend driver on the Mac port is implemented as above, and
> seems to work correctly.  The APIs used for shaping generates glyphs
> either in the physical order (Core Text) or in the logical order
> (NSLayoutManager), so I had to reorder the information about the
> generated glyphs locally by maintaining a permutation on the glyph
> indices.  You can look at the variable `permutation' in the function
> `mac_ctfont_shape' (for Core Text, in src/macfont.c) or the function
> `mac_font_shape_1' (For NSLayoutManager, in src/macappkit.m) in the
> source of the Mac port.

I don't have the codes of Mac port at hand now.  How did you
identify boundaries of grapheme clusters?  Do Core Text and
NSLayoutManager return that information?

Anyway, perhaps w32uniscribe.c should do the similar
reordering, or should be fixed to do that reordering
correctly.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 13:21:02 GMT) Full text and rfc822 format available.

Message #50 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 22:20:27 +0900

In article <837gswgqpq.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype)
> > 
> > So at first please check the documentation of ScriptShape
> > and figure out how it works for bidi script; i.e. what order
> > does it expect for input, and what order does it produce.

> From the above page:

>   If fLogicalOrder is set to TRUE in the SCRIPT_ANALYSIS structure, the
>   function always generates glyphs in the same order as the original
>   Unicode characters. If fLogicalOrder is set to FALSE, the function
>   generates right-to-left items in reverse order so that ScriptTextOut
>   does not have to reverse them before calling ExtTextOut.

> And w32uniscribe.c sets that flag to TRUE a few lines before it calls
> ScriptShape, because Emacs itself reorders characters:

I see, so the output ScriptShape is still in logical order,
and that explains this glyph order (the glyph code 969 is
for consonant and 760 is for vowel).

  [0 1 1593 969 8 1 8 12 4 nil]
  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

But, then...

>   If fLogicalOrder is set to FALSE, the function
>   generates right-to-left items in reverse order so that ScriptTextOut
>   does not have to reverse them before calling ExtTextOut.

Doesn't it mean that, if fLogicalOrder is TRUE, ScriptPlace
generates xadvance and left/right bearing while expecting
that the glyphs are re-ordered before actually rendered?

> You can find some details here:

>    http://msdn.microsoft.com/en-us/library/windows/desktop/dd317792%28v=vs.85%29.aspx

> I hope this will allow you to understand the meaning of the above
> code, by looking at how the results are used in the calls to
> LGLYPH_SET_* macros right below the above snippet.

Thank you for the pointer.  I have questions in the section
"Display Text Using Uniscribe" in that page.  The step 2
says:

    1. Extract an array of bidirectional embedding levels,
       one per range. The embedding level is given by
       (SCRIPT_ITEM) si.(SCRIPT_ANALYSIS) a. (SCRIPT_STATE)
       s.uBidiLevel.

From what and how to extract that array?

    2. Pass this array to ScriptLayout to generate a map of
       visual positions to logical positions.

There's no place in Emacs that calls ScriptLayout.  Isn't it
a problem?

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 13:38:02 GMT) Full text and rfc822 format available.

Message #53 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Jason Rumney <jasonr <at> gnu.org>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 22:37:29 +0900

In article <87393j7fdv.fsf <at> gnu.org>, Jason Rumney <jasonr <at> gnu.org> writes:

>>> > So, apparently Emacs on Windows and GNU/Linux uses the
>>> > different metrics of glyphs.

> Right, but adding the offsets to the corresponding metrics, we get the
> same result with both the Windows and GNU/Linux cases,

?? I don't understand what you mean.

> except for the
> total height of the font, which I think is because Windows counts
> inter-line spacing, while on GNU/Linux, that is separate.

I'm not sure, but currently, y-axis metrics are not the problem.

> > For Hebrew too, on Windows, I see the same problem as what
> > Steffan <smias <at> yandex.ru> reported:

> If you are seeing something different than Eli for Hebrew with the same
> font, then I suspect the cause is linked with the version of Uniscribe
> that is installed. Maybe diacritic handling for Hebrew and Arabic is a
> more recent addition to Uniscribe than the basic support for those
> languages.

Perhaps.  I tested it on Windows 7, and the tested version
of Emacs was 24.0.?, not the latest one.  I'm now
downloading the latest Windows binary of Emacs.

>>> > For instance, in the above case, we may have to render glyphs in
>>> > this order (diacritical mark first):
>>> > 
>>> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>>> >   [0 1 1593 969 8 1 8 12 4 nil]

> I'm curious as to how we ended up with the same C entry in those
> vectors.  Could this be causing us problems later on?

I don't think so.  As far as I remember, the C entries in a
glyphs string is not used after being shaped.

> The glyph index
> is correct (comparing to the GNU/Linux version), but I wonder if
> Uniscribe is referring back to the character at some point and tripping
> up because it has been changed.

I have no idea about that.

> > The comment refer to "clusters".  I don't know what it
> > exactly means in uniscribe, but I guess it relates to
> > grapheme cluster, and if so, this part seems to relates to
> > the ordering of glyphs in this kind of grapheme clauster:
> >
> >   [0 1 1593 969 8 1 8 12 4 nil]
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

> That seems to be correct.

Why?  As the xadvance of the first glyph is 8, and
the xoffset of the second glyph is 1, the second glyph is
never drawn at the same column as the first glyph.

> Maybe this is the code that is changing the
> character code to 1593.  I seem to recall that something like this was
> required for Indic languages to let Emacs know which characters had been
> linked back into one glyph.

Is that Windows specific?

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 16:17:01 GMT) Full text and rfc822 format available.

Message #56 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 20 Aug 2012 00:16:19 +0800

Kenichi Handa <handa <at> gnu.org> writes:

> In article <87393j7fdv.fsf <at> gnu.org>, Jason Rumney <jasonr <at> gnu.org> writes:
>
>>>> > So, apparently Emacs on Windows and GNU/Linux uses the
>>>> > different metrics of glyphs.
>
>> Right, but adding the offsets to the corresponding metrics, we get the
>> same result with both the Windows and GNU/Linux cases,
>
> ?? I don't understand what you mean.

I mean comparing the two cases below:

Composed with the following character(s) "ْ" using this font:
  uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
by these glyphs:
  [0 1 1593 969 8 1 8 12 4 nil]
  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

Composed with the following character(s) "ْ" using this font:
  xft:-monotype-Courier New-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 1593 969 8 2 8 4 4 nil]
  [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]

WIDTH = same in both cases.
(LBEARING - X-OFF) = off by 1 in both cases
(RBEARING - X-OFF) = off by 1 in second case

The off-by one is probably a different rounding convention used within
the respective font drawing engines.

>> > The comment refer to "clusters".  I don't know what it
>> > exactly means in uniscribe, but I guess it relates to
>> > grapheme cluster, and if so, this part seems to relates to
>> > the ordering of glyphs in this kind of grapheme clauster:
>> >
>> >   [0 1 1593 969 8 1 8 12 4 nil]
>> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>
>> That seems to be correct.
>
> Why?  As the xadvance of the first glyph is 8, and

I wasn't referring to the values in the vector, but your analysis, as
you said "I don't know ... but I guess ...".

>> Maybe this is the code that is changing the
>> character code to 1593.  I seem to recall that something like this was
>> required for Indic languages to let Emacs know which characters had been
>> linked back into one glyph.
>
> Is that Windows specific?

I don't think so. At the time, it wasn't entirely clear to me what the
requirements were for font backends, and I tried various things while
trying to figure out what the general font handling code was
expecting from the font_script function. I eventually determined that
Emacs was treating a run of glyphs that came back from font_shape with
the same character code as a single glyph for cursor movement purposes,
which prevented display problems when moving the cursor through text.
The implementation may have changed since then to make this unneccesary.

Or maybe I am misremembering, and it was more about the difficulty in
figuring out which glyphs correspond to which characters in cases where
there is not a one to one correspondance, and I didn't attempt to
resolve this difficulty because the current code seems to be working.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 17:58:01 GMT) Full text and rfc822 format available.

Message #59 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jason Rumney <jasonr <at> gnu.org>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 20:56:57 +0300

> From: Jason Rumney <jasonr <at> gnu.org>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  11860 <at> debbugs.gnu.org,  smias <at> yandex.ru
> Date: Sun, 19 Aug 2012 11:02:52 +0800
> 
> Kenichi Handa <handa <at> gnu.org> writes:
> 
> > In article <83txw0aczg.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> >> > From: Kenichi Handa <handa <at> gnu.org>
> >> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> >> > Date: Sat, 18 Aug 2012 11:45:27 +0900
> >> > 
> >> > So, apparently Emacs on Windows and GNU/Linux uses the
> >> > different metrics of glyphs.
> 
> Right, but adding the offsets to the corresponding metrics, we get the
> same result with both the Windows and GNU/Linux cases

I think the results of addition are not relevant to the problem.  The
problem is that the diacriticals and/or vowels are not drawn at
correct horizontal positions.  The values of the offsets are directly
relevant to that, because they describe how many pixels to advance
after drawing each glyph.  By contrast, the sum of the offsets will be
always approximately the same, since the entire grapheme cluster
occupies a single character cell.

> So I'm not sure that this is causing us problems (see Eli's report about
> Hebrew), it's just a case of a different reference point being used
> between Windows and GNU/Linux.

My report about Hebrew is not relevant either; see below.

> If you are seeing something different than Eli for Hebrew with the same
> font, then I suspect the cause is linked with the version of Uniscribe
> that is installed. Maybe diacritic handling for Hebrew and Arabic is a
> more recent addition to Uniscribe than the basic support for those
> languages.

That appears to be the case, indeed.  My initial attempts to reproduce
this were on XP SP3, where Hebrew rendering appeared to be OK.  I now
tried on Windows 7 and there I see the problem with Hebrew as well.

Moreover, when I type the Hebrew characters specified by the OP, I
don't see that the uniscribe_shape function is called at all on XP: a
breakpoint inside it never breaks.  On Windows 7, that function does
get called.

Jason, how can I find out whether Uniscribe is used for rendering
Hebrew, or why doesn't Emacs call uniscribe_shape?  (I know about
uniscribe_font->cache, but I don't see that function called even if I
start Emacs with a breakpoint in it, so it seems the cache is not the
issue here.  The cache is per application, right?)

For Arabic characters in the recipe, uniscribe_shape _is_ called on
XP.  I guess that's why the problem with Arabic is visible on both XP
and Windows7.

For the record, here's the output of "C-u C-x =" on XP for the Hebrew
character composition mentioned earlier:

	       position: 193 of 194 (99%), column: 1
	      character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2)
      preferred charset: iso-8859-8 (ISO/IEC 8859/8)
  code point in charset: 0xE2
		 syntax: w 	which means: word
	       category: .:Base, R:Right-to-left (strong)
	       to input: type "d" with hebrew-full
	    buffer code: #xD7 #x92
	      file code: #xE2 (encoded by coding system hebrew-iso-8bit-dos)
		display: composed to form "גֻ" (see below)

  Composed with the following character(s) "ֻ" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-8
  by these glyphs:
    [0 1 1490 674 8 0 6 12 4 nil]
    [0 1 1467 663 8 0 7 12 4 [-8 0 0]]

Compare with the output on Windows 7 to see the differences:

	       position: 193 of 194 (99%), column: 1
	      character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x05D2
		 syntax: w 	which means: word
	       category: .:Base, R:Right-to-left (strong)
	       to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
	    buffer code: #xD7 #x92
	      file code: not encodable by coding system iso-latin-1-dos
		display: composed to form "גֻ" (see below)

  Composed with the following character(s) "ֻ" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 1490 674 8 1 6 12 4 nil]
    [0 1 1490 663 0 2 6 12 4 nil]

And here's the output of "C-u C-x =" for the Arabic character Ayin
with sukun on XP:

	       position: 197 of 198 (99%), column: 0
	      character: ع‎ (displayed as ع‎) (codepoint 1593, #o3071, #x639)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x0639
		 syntax: w 	which means: word
	       category: .:Base, R:Right-to-left (strong), b:Arabic
	    buffer code: #xD8 #xB9
	      file code: not encodable by coding system hebrew-iso-8bit-dos
		display: composed to form "عْ" (see below)

  Composed with the following character(s) "ْ" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 1593 969 8 2 8 12 4 nil]
    [0 1 1593 1028 0 3 6 12 4 nil]

Note that the glyph index of the sukun are different from the Windows
7 output.  I have no idea why.

> >> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> >> >   [0 1 1593 969 8 1 8 12 4 nil]
> 
> I'm curious as to how we ended up with the same C entry in those
> vectors.

That's because the code in uniscribe_shape does this:

		  LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
						 + from]);

and it does that for all the 'nglyphs' glyphs produced by ScriptPlace.

As Handa-san writes, the character code is never used, because we have
the font glyph index and its metrics, so I think this is a non-issue.

> Could this be causing us problems later on?  The glyph index
> is correct (comparing to the GNU/Linux version), but I wonder if
> Uniscribe is referring back to the character at some point and tripping
> up because it has been changed.

Uniscribe cannot refer to this code, because Uniscribe doesn't use
LGSTRING, IIUC.  Or does it?  (If it does, please show where in the
code it uses that value.)

> > 		  /* Detect clusters, for linking codes back to
> > 		     characters.  */
> > 		  if (attributes[j].fClusterStart)
> > 		    {
> > 		      while (from < nchars_in_run && clusters[from] < j)
> > 			from++;
> > 		      if (from >= nchars_in_run)
> > 			from = to = nchars_in_run - 1;
> > 		      else
> > 			{
> > 			  int k;
> > 			  to = nchars_in_run - 1;
> > 			  for (k = from + 1; k < nchars_in_run; k++)
> > 			    {
> > 			      if (clusters[k] > j)
> > 				{
> > 				  to = k - 1;
> > 				  break;
> > 				}
> > 			    }
> > 			}
> > 		    }
> >
> > The comment refer to "clusters".  I don't know what it
> > exactly means in uniscribe, but I guess it relates to
> > grapheme cluster, and if so, this part seems to relates to
> > the ordering of glyphs in this kind of grapheme clauster:
> >
> >   [0 1 1593 969 8 1 8 12 4 nil]
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> 
> That seems to be correct.  Maybe this is the code that is changing the
> character code to 1593.

It doesn't _change_ the character code, it simply sets it to the code
of the base character.  But again, I don't think this is relevant.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 18:23:01 GMT) Full text and rfc822 format available.

Message #62 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>, Jason Rumney <jasonr <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 21:22:40 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> Date: Sat, 18 Aug 2012 11:45:27 +0900
> 
> So, apparently Emacs on Windows and GNU/Linux uses the
> different metrics of glyphs.  As the shaper on GNU/Linux
> (m17n-lib library) works correctly for the same font, and
> the other applications on Windows have no problem, I suspect
> that the problem is in Emacs' interface with uniscribe
> (w32font.c or w32uniscribe.c).

I agree.

> If this problem happens only for bidi scripts

Can you suggest how to test this hypothesis?

> one possibility is that Emacs's rendering engine (xdisp.c) expects
> glyphs in a glyph-string are rendered in that order from left to
> right, but the returned glyph-string on Windows should be rendered
> in reverse order.

You may be right, but it's hard to be sure.  At least the advances[]
array returned by ScriptPlace seems to point into that direction.
Here's what I see in the debugger:

  Breakpoint 8, uniscribe_shape (lgstring=55041941) at w32uniscribe.c:373
  373                       LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
  (gdb) p items <at> nitems
  $1 = {0x35195a0}
  (gdb) p items[0]@nitems
  $2 = {{
      iCharPos = 0,
      a = {
	eScript = 26,
	fRTL = 1,
	fLayoutRTL = 1,
	fLinkBefore = 0,
	fLinkAfter = 0,
	fLogicalOrder = 1,
	fNoGlyphIndex = 0,
	s = {
	  uBidiLevel = 1,
	  fOverrideDirection = 0,
	  fInhibitSymSwap = 0,
	  fCharShape = 0,
	  fDigitSubstitute = 0,
	  fInhibitLigate = 0,
	  fDisplayZWG = 0,
	  fArabicNumContext = 0,
	  fGcpClusters = 0,
	  fReserved = 0,
	  fEngineReserved = 0
	}
      }
    }}
  (gdb) p nitems
  $3 = 1
  (gdb) p nglyphs
  $4 = 2
  (gdb) p advances[0]@nglyphs
  $5 = {8, 0}
  (gdb) p offsets[0]@nglyphs
  $6 = {{
      du = 0,
      dv = 0
    }, {
      du = 1,
      dv = -2
    }}
  (gdb) p chars[0]@2
  $7 = L"\x639\x652"

(Note that the fRTL member of items[0].a is set to TRUE.)  My
understanding of the advances[] array is that it gives, for each glyph
in the cluster, the number of pixels to advance to the right after
drawing the glyph.  So the fact that it is 8 for the first (base)
character and zero for the second one tells me that this grapheme
cluster is supposed to be rendered in reverse order: first the Sukun,
then Ayin at the same location, and then advance by 8 pixels for the
next character.  Is this correct?

If it is correct, then how come the glyphs shown on GNU/Linux also
have non-zero value of xadvance:

  [0 1 1593 969 8 2 8 4 4 nil]
  [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]

The value 8 after 969 comes directly from xadvance, as this code in
ftfont.c shows:

      LGLYPH_SET_WIDTH (lglyph, g->xadv >> 6);

Is the meaning of xadvance in libotf different from its meaning in
Uniscribe?  (And why is the glyph string element called WIDTH instead
of ADVANCE?)  If not, what am I missing?

> For instance, in the above case, we may have to render glyphs in
> this order (diacritical mark first):
> 
>   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>   [0 1 1593 969 8 1 8 12 4 nil]

I tried the naive patch below, but it didn't quite work.  It seems
like those changes somehow prevented character composition.  Perhaps
Handa-san could give me some guidance here.

> I think the further debugging must be done by those who
> knows uniscribe, w32font.c, and w32uniscribe.c.

It's very hard, given that glyph-string documentation leaves a lot to
be desired, and the way its various components are used during drawing
is also left without clear documentation.  E.g., this:

    FROM-IDX and TO-IDX are used internally and should not be touched.

is not really helpful for explaining what are FROM-IDX and TO-IDX, so
how can I figure out whether the code you asked about is doing TRT?
And without knowing what is each component of glyph-string used for
during drawing, how can I compare the values produced by Uniscribe
APIs with what glyph-string needs?  If someone could explain all those
things, it would make debugging possible.  Otherwise, I'm just
randomly poking around...

Here's the patch I tried:

--- src/w32uniscribe.c~	2012-07-08 07:24:56.000000000 +0300
+++ src/w32uniscribe.c	2012-08-19 15:55:17.323623900 +0300
@@ -331,17 +331,13 @@ uniscribe_shape (Lisp_Object lgstring)
 		  Lisp_Object lglyph = LGSTRING_GLYPH (lgstring, lglyph_index);
 		  ABC char_metric;
 		  unsigned gl;
+		  int j1;
 
 		  if (NILP (lglyph))
 		    {
 		      lglyph = Fmake_vector (make_number (LGLYPH_SIZE), Qnil);
 		      LGSTRING_SET_GLYPH (lgstring, lglyph_index, lglyph);
 		    }
-		  /* Copy to a 32-bit data type to shut up the
-		     compiler warning in LGLYPH_SET_CODE about
-		     comparison being always false.  */
-		  gl = glyphs[j];
-		  LGLYPH_SET_CODE (lglyph, gl);
 
 		  /* Detect clusters, for linking codes back to
 		     characters.  */
@@ -365,6 +361,16 @@ uniscribe_shape (Lisp_Object lgstring)
 			    }
 			}
 		    }
+		  if (items[i].a.fRTL)
+		    j1 = to - (j - from);
+		  else
+		    j1 = j;
+
+		  /* Copy to a 32-bit data type to shut up the
+		     compiler warning in LGLYPH_SET_CODE about
+		     comparison being always false.  */
+		  gl = glyphs[j1];
+		  LGLYPH_SET_CODE (lglyph, gl);
 
 		  LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
 						 + from]);
@@ -372,13 +378,13 @@ uniscribe_shape (Lisp_Object lgstring)
 		  LGLYPH_SET_TO (lglyph, items[i].iCharPos + to);
 
 		  /* Metrics.  */
-		  LGLYPH_SET_WIDTH (lglyph, advances[j]);
+		  LGLYPH_SET_WIDTH (lglyph, advances[j1]);
 		  LGLYPH_SET_ASCENT (lglyph, font->ascent);
 		  LGLYPH_SET_DESCENT (lglyph, font->descent);
 
 		  result = ScriptGetGlyphABCWidth (context,
 						   &(uniscribe_font->cache),
-						   glyphs[j], &char_metric);
+						   glyphs[j1], &char_metric);
 		  if (result == E_PENDING && !context)
 		    {
 		      /* Cache incomplete... */
@@ -387,7 +393,7 @@ uniscribe_shape (Lisp_Object lgstring)
 		      old_font = SelectObject (context, FONT_HANDLE (font));
 		      result = ScriptGetGlyphABCWidth (context,
 						       &(uniscribe_font->cache),
-						       glyphs[j], &char_metric);
+						       glyphs[j1], &char_metric);
 		    }
 
 		  if (SUCCEEDED (result))
@@ -399,17 +405,17 @@ uniscribe_shape (Lisp_Object lgstring)
 		  else
 		    {
 		      LGLYPH_SET_LBEARING (lglyph, 0);
-		      LGLYPH_SET_RBEARING (lglyph, advances[j]);
+		      LGLYPH_SET_RBEARING (lglyph, advances[j1]);
 		    }
 
-		  if (offsets[j].du || offsets[j].dv)
+		  if (offsets[j1].du || offsets[j1].dv)
 		    {
 		      Lisp_Object vec;
 		      vec = Fmake_vector (make_number (3), Qnil);
-		      ASET (vec, 0, make_number (offsets[j].du));
-		      ASET (vec, 1, make_number (offsets[j].dv));
+		      ASET (vec, 0, make_number (offsets[j1].du));
+		      ASET (vec, 1, make_number (offsets[j1].dv));
 		      /* Based on what ftfont.c does... */
-		      ASET (vec, 2, make_number (advances[j]));
+		      ASET (vec, 2, make_number (advances[j1]));
 		      LGLYPH_SET_ADJUSTMENT (lglyph, vec);
 		    }
 		  else

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 18:46:02 GMT) Full text and rfc822 format available.

Message #65 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 21:44:56 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
> Date: Sun, 19 Aug 2012 22:20:27 +0900
> 
> >   If fLogicalOrder is set to FALSE, the function
> >   generates right-to-left items in reverse order so that ScriptTextOut
> >   does not have to reverse them before calling ExtTextOut.
> 
> Doesn't it mean that, if fLogicalOrder is TRUE, ScriptPlace
> generates xadvance and left/right bearing while expecting
> that the glyphs are re-ordered before actually rendered?

It could mean that.  But it's still only a guess, as the documentation
is unclear.

> > You can find some details here:
> 
> >    http://msdn.microsoft.com/en-us/library/windows/desktop/dd317792%28v=vs.85%29.aspx
> 
> > I hope this will allow you to understand the meaning of the above
> > code, by looking at how the results are used in the calls to
> > LGLYPH_SET_* macros right below the above snippet.
> 
> Thank you for the pointer.

Here are 3 more:

  http://maxradi.us/documents/uniscribe/
  http://www.catch22.net/tuts/uniscribe-mysteries
  http://www.catch22.net/tuts/more-uniscribe-mysteries

> I have questions in the section
> "Display Text Using Uniscribe" in that page.  The step 2
> says:
> 
>     1. Extract an array of bidirectional embedding levels,
>        one per range. The embedding level is given by
>        (SCRIPT_ITEM) si.(SCRIPT_ANALYSIS) a. (SCRIPT_STATE)
>        s.uBidiLevel.
> 
> From what and how to extract that array?

From items[i].a.s.uBidiLevel.  I showed an example in an earlier
message, where you can see that uBidiLevel is 1 (i.e. RTL).

We don't use this information because Emacs reorders characters
itself, it doesn't need the UAX#9 implementation contained in
Uniscribe.

>     2. Pass this array to ScriptLayout to generate a map of
>        visual positions to logical positions.
> 
> There's no place in Emacs that calls ScriptLayout.  Isn't it
> a problem?

I don't think so, at least not directly.  ScriptLayout actually draws
the shaped glyphs on the screen.  Emacs doesn't use it because it
draws the glyphs by itself, using the information in the glyph-strings
generated from the data returned by the shaping engine.  Or am I
missing something?

However, the ScriptLayout issue affects us indirectly because most
(all?) other applications do use ScriptLayout, where Emacs draws
glyphs by itself.  That is why one of the references above explicitly
says:

  pGoffset  [...] The application generally doesn’t have to pay
      attention to these offsets at all. They are generated by
      ScriptPlace and used by ScriptTextOut, and all the application
      needs to do is keep track of the values in the meantime.

The problem is, Emacs does use "all these offsets" and other stuff,
and so we are being hit by their insufficient documentation.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 18:53:01 GMT) Full text and rfc822 format available.

Message #68 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 21:52:38 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: eliz <at> gnu.org,  11860 <at> debbugs.gnu.org,  smias <at> yandex.ru
> Date: Sun, 19 Aug 2012 22:37:29 +0900
> 
> > > The comment refer to "clusters".  I don't know what it
> > > exactly means in uniscribe, but I guess it relates to
> > > grapheme cluster, and if so, this part seems to relates to
> > > the ordering of glyphs in this kind of grapheme clauster:
> > >
> > >   [0 1 1593 969 8 1 8 12 4 nil]
> > >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> 
> > That seems to be correct.
> 
> Why?  As the xadvance of the first glyph is 8, and
> the xoffset of the second glyph is 1, the second glyph is
> never drawn at the same column as the first glyph.

I agree with your analysis, but then it is unclear to me why the other
components of the vector are different between GNU/Linux and Windows 7.
Can you explain them?

For instance, this (Windows):

  [0 1 1593 969 8 1 8 12 4 nil]

vs this (GNU/Linux):

  [0 1 1593 969 8 2 8 4 4 nil]

raises the following questions:

 . why are the values of LBEARING different (1 vs 2)?
 . why are the values of ASCENT different (12 vs 4)?  The Windows code
   takes ASCENT and DESCENT values from the font -- is that correct?

The fonts are identical, so I'd expect identical values here, at least
for the base character.  It is hard to debug more complex portions of
the code when such basic values already differ.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 18:55:01 GMT) Full text and rfc822 format available.

Message #71 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Werner LEMBERG <wl <at> gnu.org>
To: eliz <at> gnu.org
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels)
	don't appear
Date: Sun, 19 Aug 2012 20:53:54 +0200 (CEST)

>   http://maxradi.us/documents/uniscribe/
>   http://www.catch22.net/tuts/uniscribe-mysteries
>   http://www.catch22.net/tuts/more-uniscribe-mysteries

I suggest to contact Behdad Esfahbod <behdad <at> behdad.org>, developer of
the HarfBuzz library (which is used e.g. in Firefox for OpenType
layout).  He can probably help with explaining weird Uniscribe
behaviour.


    Werner

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 19 Aug 2012 18:56:01 GMT) Full text and rfc822 format available.

Message #74 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jason Rumney <jasonr <at> gnu.org>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 21:54:51 +0300

> From: Jason Rumney <jasonr <at> gnu.org>
> Cc: eliz <at> gnu.org,  11860 <at> debbugs.gnu.org,  smias <at> yandex.ru
> Date: Mon, 20 Aug 2012 00:16:19 +0800
> 
> Or maybe I am misremembering, and it was more about the difficulty in
> figuring out which glyphs correspond to which characters in cases where
> there is not a one to one correspondance

This difficulty is indeed there.  How does libotf solve this problem?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 20 Aug 2012 14:59:02 GMT) Full text and rfc822 format available.

Message #77 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 20 Aug 2012 23:57:53 +0900

In article <83mx1qd85g.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > Or maybe I am misremembering, and it was more about the difficulty in
> > figuring out which glyphs correspond to which characters in cases where
> > there is not a one to one correspondance

> This difficulty is indeed there.  How does libotf solve this problem?

If one GSUB feature converts the input gstring "AB" to "ab",
all what we can say is that the resulting glyph sequence
"ab" corresponds to "AB", and we can't say "a" corresponds
to "A" and "b" corresponds to "B".

So, m17nlib and libotf sets the same FROM and TO indices to
both "a" and "b" (in the above case, 0 and 1 respectively),
and thus they constitute a single grapheme cluster and Emacs
treat it as a single glyph on cursor movement.

But, when one GSUB feature converts "A" to "a" and another
convertes "B" to "b", "a" and "b" doesn't constitute a
grapheme cluster because they still have the different FROM
and TO indices.  Now, another GPOS features will be applied
to "b" to adjust its x/y offsets so that "b" will be placed
on "a".  In this case, it's up to an application to handle
them separately (the application should be able to put
cursor on "a" part and "b" part separately) or to handle
them as a single grapheme cluster.  Emacs does the latter by
forcing them to have the same FROM and TO indices (the code
mentioned by Yamamoto-san does it (in Ffont_shape_gstring
after calling font->driver->shape)).

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 20 Aug 2012 17:18:02 GMT) Full text and rfc822 format available.

Message #80 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 20 Aug 2012 20:16:59 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: jasonr <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> Date: Mon, 20 Aug 2012 23:57:53 +0900
> 
> In article <83mx1qd85g.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > > Or maybe I am misremembering, and it was more about the difficulty in
> > > figuring out which glyphs correspond to which characters in cases where
> > > there is not a one to one correspondance
> 
> > This difficulty is indeed there.  How does libotf solve this problem?
> 
> If one GSUB feature converts the input gstring "AB" to "ab",
> all what we can say is that the resulting glyph sequence
> "ab" corresponds to "AB", and we can't say "a" corresponds
> to "A" and "b" corresponds to "B".
> 
> So, m17nlib and libotf sets the same FROM and TO indices to
> both "a" and "b" (in the above case, 0 and 1 respectively),
> and thus they constitute a single grapheme cluster and Emacs
> treat it as a single glyph on cursor movement.

Thanks.  But I wasn't asking about LGLYPH_SET_FROM and LGLYPH_SET_TO,
I was asking about LGLYPH_SET_CHAR.  In the Windows implementation, we
assign the same codepoint there to all the glyphs in the grapheme
cluster, while on GNU/Linux you showed output that suggested we put
different character codepoints there.

> But, when one GSUB feature converts "A" to "a" and another
> convertes "B" to "b", "a" and "b" doesn't constitute a
> grapheme cluster because they still have the different FROM
> and TO indices.  Now, another GPOS features will be applied
> to "b" to adjust its x/y offsets so that "b" will be placed
> on "a".  In this case, it's up to an application to handle
> them separately (the application should be able to put
> cursor on "a" part and "b" part separately) or to handle
> them as a single grapheme cluster.  Emacs does the latter by
> forcing them to have the same FROM and TO indices (the code
> mentioned by Yamamoto-san does it (in Ffont_shape_gstring
> after calling font->driver->shape)).

Thanks.  I've spent the best part of the last day reading about font
metrics, trying to understand the meaning of every component of the
gstring object.  I still don't get all of it, though.  Specifically,
it is still largely unclear what do we use each component for in
drawing each glyph that belongs to a grapheme cluster.  One problem is
that terms like rbearing, lbearing, etc. are not always used in the
same sense as their definitions in digital typography references.

Could you please point to the documentation where the meaning of
gstring components is spelled out, or to code from which I could try
gleaning this information?  I see w32_compute_glyph_string_overhangs
and x_draw_composite_glyph_string_foreground in w32term.c -- are these
the places to look, or is there more?  What about lisp/composite.el --
is that still relevant for automatic compositions?

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 20 Aug 2012 17:25:02 GMT) Full text and rfc822 format available.

Message #83 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: handa <at> gnu.org, jasonr <at> gnu.org
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 20 Aug 2012 20:24:14 +0300

> Date: Sun, 19 Aug 2012 21:44:56 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> 
> >     2. Pass this array to ScriptLayout to generate a map of
> >        visual positions to logical positions.
> > 
> > There's no place in Emacs that calls ScriptLayout.  Isn't it
> > a problem?
> 
> I don't think so, at least not directly.  ScriptLayout actually draws
> the shaped glyphs on the screen.  Emacs doesn't use it because it
> draws the glyphs by itself, using the information in the glyph-strings
> generated from the data returned by the shaping engine.  Or am I
> missing something?

Answering my own question here: yes, I did miss something.  We do call
the 'draw' method of the font driver, in w32term.c.  However, the
Windows implementation of this is on w32font.c, and it calls
ExtTextOutW, not ScriptTextOut.  Which might be a problem, hmm...

As for ScriptLayout, it is only needed for logical-to-visual
reordering of _items_, whereas we always pass to Uniscribe a chunk of
text that can only become a single item.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 21 Aug 2012 09:21:02 GMT) Full text and rfc822 format available.

Message #86 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 21 Aug 2012 18:20:28 +0900

In article <83393hcwl0.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> Thanks.  But I wasn't asking about LGLYPH_SET_FROM and LGLYPH_SET_TO,
> I was asking about LGLYPH_SET_CHAR.  In the Windows implementation, we
> assign the same codepoint there to all the glyphs in the grapheme
> cluster, while on GNU/Linux you showed output that suggested we put
> different character codepoints there.

CHAR slot of a GLYPH has no meaning after shaping except for
as debugging information.  So, font->driver->shape doesn't
have to worry about it that much.

> Thanks.  I've spent the best part of the last day reading about font
> metrics, trying to understand the meaning of every component of the
> gstring object.  I still don't get all of it, though.  Specifically,
> it is still largely unclear what do we use each component for in
> drawing each glyph that belongs to a grapheme cluster.  One problem is
> that terms like rbearing, lbearing, etc. are not always used in the
> same sense as their definitions in digital typography references.

> Could you please point to the documentation where the meaning of
> gstring components is spelled out, or to code from which I could try
> gleaning this information?

The meaning of elements (i.e. GLYPHs) in GSTRING is
described in the docstring of composition-get-gstring as this;

GLYPH is a vector whose elements have this form:
    [ FROM-IDX TO-IDX C CODE WIDTH LBEARING RBEARING ASCENT DESCENT
      [ [X-OFF Y-OFF WADJUST] | nil] ]
where
    FROM-IDX and TO-IDX are used internally and should not be touched.
    C is the character of the glyph.
    CODE is the glyph-code of C in FONT-OBJECT.
    WIDTH thru DESCENT are the metrics (in pixels) of the glyph.
    X-OFF and Y-OFF are offsets to the base position for the glyph.
    WADJUST is the adjustment to the normal width of the glyph.

and the meanings of WIDTH, LBEARING, RBEARING, ASCENT,
DESCENT are the same as X's XCharStruct which Emacs has been
used for long (man of XLoadFont shows this info).

typedef struct {
	short lbearing;			/* origin to left edge of raster */
	short rbearing;			/* origin to right edge of raster */
	short width;			/* advance to next char's origin */
	short ascent;			/* baseline to top edge of raster */
	short descent;			/* baseline to bottom edge of raster */
	unsigned short attributes;	/* per char flags (not predefined) */
} XCharStruct;

> I see w32_compute_glyph_string_overhangs
> and x_draw_composite_glyph_string_foreground in w32term.c -- are these
> the places to look, or is there more?  What about lisp/composite.el --
> is that still relevant for automatic compositions?

In the phase of ri->produce_glyphs, the function
autocmp_chars (in composite.c) is the start function of
shaping.  It calls auto-compose-chars (in composite.el) and
that leads to a call of font->driver->shape via
(compose-gstring-for-graphic and Ffont_shape_gstring).  This
builds up GSTRING.

Another work of this phase is to set (struct
glyph)->slice.cmp.from and ...cmp.to so that the actual
drawing routine knows which cluster of GSTRING each (struct
glyph) object corresponds to.  For that,
composition_update_it (in composite.c) sets and updates
indices of GSTRING to (struct composition_it)->from and to
according to the value of (stuct
composition_it)->reversed.p, and append_composite_glyph (in
xdisp.c) sets those values to (struct glyph)->slice.cmp.from
and ...cmp.to.

Now all the information is readly of the drawing routine.

In the phase of ri->write_glyphs, the function draw_glyphs
calls BUILD_COMPOSITE_GLYPH_STRING (via the macro
BUILD_GSTRING_GLYPH), and it sets (struct
glyph_string)->cmp, (struct glyph_string)->cmp_from, (struct
glyph_string)->cmp_to.  Next draw_glyphs calls
ri->draw_glyph_string which at last calls
x_draw_composite_glyph_string_foreground which calls
font->driver->draw.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 21 Aug 2012 13:18:02 GMT) Full text and rfc822 format available.

Message #89 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 21 Aug 2012 22:16:51 +0900

In article <83sjbid9n3.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > one possibility is that Emacs's rendering engine (xdisp.c) expects
> > glyphs in a glyph-string are rendered in that order from left to
> > right, but the returned glyph-string on Windows should be rendered
> > in reverse order.

> You may be right, but it's hard to be sure.  At least the advances[]
> array returned by ScriptPlace seems to point into that direction.
> Here's what I see in the debugger:

>   Breakpoint 8, uniscribe_shape (lgstring=55041941) at w32uniscribe.c:373
>   373                       LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
[...]
>   (gdb) p advances[0]@nglyphs
>   $5 = {8, 0}
>   (gdb) p offsets[0]@nglyphs
>   $6 = {{
>       du = 0,
>       dv = 0
>     }, {
>       du = 1,
>       dv = -2
>     }}
>   (gdb) p chars[0]@2
>   $7 = L"\x639\x652"

> (Note that the fRTL member of items[0].a is set to TRUE.)  My
> understanding of the advances[] array is that it gives, for each glyph
> in the cluster, the number of pixels to advance to the right after
> drawing the glyph.  So the fact that it is 8 for the first (base)
> character and zero for the second one tells me that this grapheme
> cluster is supposed to be rendered in reverse order: first the Sukun,
> then Ayin at the same location, and then advance by 8 pixels for the
> next character.  Is this correct?

I think so.

> If it is correct, then how come the glyphs shown on GNU/Linux also
> have non-zero value of xadvance:

>   [0 1 1593 969 8 2 8 4 4 nil]
>   [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]

Emacs draws the first glyph at its base point and advance
the base point 8 pixels to the right (because the WIDTH of
the first glyph is 8).  Then Emacs draw the second glyph at
9 pixels left and 2 pixels up from the base point.  So, the
second glyph is drawn above the first glyph.

> > For instance, in the above case, we may have to render glyphs in
> > this order (diacritical mark first):
> > 
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> >   [0 1 1593 969 8 1 8 12 4 nil]

> I tried the naive patch below, but it didn't quite work.  It seems
> like those changes somehow prevented character composition.  Perhaps
> Handa-san could give me some guidance here.

Did your patch produced the above GSTRING?

> > I think the further debugging must be done by those who
> > knows uniscribe, w32font.c, and w32uniscribe.c.

> It's very hard, given that glyph-string documentation leaves a lot to
> be desired, and the way its various components are used during drawing
> is also left without clear documentation.  E.g., this:

>     FROM-IDX and TO-IDX are used internally and should not be touched.

> is not really helpful for explaining what are FROM-IDX and TO-IDX, so
> how can I figure out whether the code you asked about is doing TRT?

The are indices to the original character sequence of that
GSTRING.  If a glyph has N and M values for them, that glyph
corresponds to the Nth to Mth (inclusive) characters.

> And without knowing what is each component of glyph-string used for
> during drawing, how can I compare the values produced by Uniscribe
> APIs with what glyph-string needs?  If someone could explain all those
> things, it would make debugging possible.  Otherwise, I'm just
> randomly poking around...

Please see the function
x_draw_composite_glyph_string_foreground (in xterm.c and
w32term.c).  It shows which component of GSTRING is used for
drawing (the last branch of "iff" condition).

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 21 Aug 2012 17:33:01 GMT) Full text and rfc822 format available.

Message #92 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 21 Aug 2012 20:32:32 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: jasonr <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> Date: Tue, 21 Aug 2012 22:16:51 +0900
> 
> > (Note that the fRTL member of items[0].a is set to TRUE.)  My
> > understanding of the advances[] array is that it gives, for each glyph
> > in the cluster, the number of pixels to advance to the right after
> > drawing the glyph.  So the fact that it is 8 for the first (base)
> > character and zero for the second one tells me that this grapheme
> > cluster is supposed to be rendered in reverse order: first the Sukun,
> > then Ayin at the same location, and then advance by 8 pixels for the
> > next character.  Is this correct?
> 
> I think so.

Well, it turns out that the truth is slightly different.  When the
Uniscribe shaper is handed a chunk of RTL text with the fLogicalOrder
flag set to TRUE, it prepares the glyphs in the logical order, but
assumes that they will be laid out in reverse.  In this reverse order,
the width advance value is applied _before_ drawing the glyph, and
positive width advance values move the pen to the _left_.  I found
this important information on some Web page, which I unfortunately can
no longer find.

In addition, it looks like in this "reverse" mode, the X-OFFSET value
is also interpreted in the reverse direction, so its sign must be
flipped for glyphs in RTL grapheme clusters.

Armed with this knowledge, with the information you posted, and after
studying the drawing code in w32term.c, I made some semi-empirical
changes in uniscribe_shape that produce good results both with Arabic
and with Hebrew.  In a nutshell, I adjusted the X-OFFSET values for
the width of the base-character glyph.  The results are committed as
trunk revision 109726; as I only tested the modification on a small
sample of composed texts, please see if you can run more tests with as
complex compositions as you can find.

> > If it is correct, then how come the glyphs shown on GNU/Linux also
> > have non-zero value of xadvance:
> 
> >   [0 1 1593 969 8 2 8 4 4 nil]
> >   [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]
> 
> Emacs draws the first glyph at its base point and advance
> the base point 8 pixels to the right (because the WIDTH of
> the first glyph is 8).  Then Emacs draw the second glyph at
> 9 pixels left and 2 pixels up from the base point.  So, the
> second glyph is drawn above the first glyph.

I see.  This was somewhat counter-intuitive (why move first and then
correct that by negative offsets, instead of not moving until all the
glyphs in the cluster are drawn?).

> > > For instance, in the above case, we may have to render glyphs in
> > > this order (diacritical mark first):
> > > 
> > >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> > >   [0 1 1593 969 8 1 8 12 4 nil]
> 
> > I tried the naive patch below, but it didn't quite work.  It seems
> > like those changes somehow prevented character composition.  Perhaps
> > Handa-san could give me some guidance here.
> 
> Did your patch produced the above GSTRING?

Yes.  But I think swapping the glyphs in the cluster was not the right
idea, because it violates the assumptions in w32font_draw, the drawing
routine called by the font back-end.  That routine expects the first
glyph to be for the base character of the composition.

> Please see the function
> x_draw_composite_glyph_string_foreground (in xterm.c and
> w32term.c).  It shows which component of GSTRING is used for
> drawing (the last branch of "iff" condition).

Yes, that helped, thanks.

Steffan, can you try the latest trunk code, and see if there are any
problems left?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 22 Aug 2012 09:16:02 GMT) Full text and rfc822 format available.

Message #95 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 22 Aug 2012 18:15:05 +0900

In article <83lih8b173.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> Well, it turns out that the truth is slightly different.  When the
> Uniscribe shaper is handed a chunk of RTL text with the fLogicalOrder
> flag set to TRUE, it prepares the glyphs in the logical order, but
> assumes that they will be laid out in reverse.  In this reverse order,
> the width advance value is applied _before_ drawing the glyph, and
> positive width advance values move the pen to the _left_.  I found
> this important information on some Web page, which I unfortunately can
> no longer find.

> In addition, it looks like in this "reverse" mode, the X-OFFSET value
> is also interpreted in the reverse direction, so its sign must be
> flipped for glyphs in RTL grapheme clusters.

> Armed with this knowledge, with the information you posted, and after
> studying the drawing code in w32term.c, I made some semi-empirical
> changes in uniscribe_shape that produce good results both with Arabic
> and with Hebrew.  In a nutshell, I adjusted the X-OFFSET values for
> the width of the base-character glyph.  The results are committed as
> trunk revision 109726; as I only tested the modification on a small
> sample of composed texts, please see if you can run more tests with as
> complex compositions as you can find.

As I currently don't have an environment for building Emasc
on Windows, 

> > > If it is correct, then how come the glyphs shown on GNU/Linux also
> > > have non-zero value of xadvance:
> > 
> > >   [0 1 1593 969 8 2 8 4 4 nil]
> > >   [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]
> > 
> > Emacs draws the first glyph at its base point and advance
> > the base point 8 pixels to the right (because the WIDTH of
> > the first glyph is 8).  Then Emacs draw the second glyph at
> > 9 pixels left and 2 pixels up from the base point.  So, the
> > second glyph is drawn above the first glyph.

> I see.  This was somewhat counter-intuitive (why move first and then
> correct that by negative offsets, instead of not moving until all the
> glyphs in the cluster are drawn?).

I think it's more intuitive.  It draws glyphs as you write
by hand.  The exact place to draw a dependent vowel depends
on a base consonant.  So, you anyway have to adjust vowel's
base point of drawing.

> > > > For instance, in the above case, we may have to render glyphs in
> > > > this order (diacritical mark first):
> > > > 
> > > >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> > > >   [0 1 1593 969 8 1 8 12 4 nil]
> > 
> > > I tried the naive patch below, but it didn't quite work.  It seems
> > > like those changes somehow prevented character composition.  Perhaps
> > > Handa-san could give me some guidance here.
> > 
> > Did your patch produced the above GSTRING?

> Yes.  But I think swapping the glyphs in the cluster was not the right
> idea, because it violates the assumptions in w32font_draw, the drawing
> routine called by the font back-end.  That routine expects the first
> glyph to be for the base character of the composition.

As far as WIDTH, XOFF, YOFF, WADJUST are correct, the
drawing routines should work even if a combining mark comes
first.  The code that expects the first glyph to be a base
is Ffont_shape_gstring.  If the shaped GSTRING returned from
font->driver->shape has GLYPH sequence "Abc", A's
offset vector [X-OFF Y-OFF WADJUST] is nil, b and c's offset
vectors are not nil, Ffont_shape_gstring assumes that "Abc"
constitutes a grapheme cluster.

Anyway, thank you very much for the patch.  I have not yet
tried it because I currently don't have an environment to
build Emacs on windows.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 22 Aug 2012 19:53:01 GMT) Full text and rfc822 format available.

Message #98 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Kenichi Handa <handa <at> gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru,
	jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 22 Aug 2012 21:52:07 +0200

> Kenichi Handa-4 Aug 18, 2012; 4:45am: 

> I first confirmed that the described problems of Arabic and
> Hebrew occur with Emacs running on Windows.  Typing C-u C-x
> = on the first Arabic character (U+0639) showed that "Courier
> New" font is used for it, and showed this composition
> information.
> 
> Composed with the following character(s) "ْ" using this font:
>   uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 

I have the same font. 

---------------------------------

> Kenichi Handa-4, Aug 18, 2012; 11:19am:

> For Hebrew too, on Windows, I see the same problem as what
> Steffan <[hidden email]> reported:
> In article <[hidden email]>, Steffan <[hidden email]> writes:
>>> I choose "hebrew-full" as input-method.
>>>
>>> - After typing 'f' I get KAF
>>> - then by typing d I get GIMMEL
>>> - and after typing 'D' I get "the three point sign" (HEBREW POINT QUBUTS) not below the GIMMEL but the KAF! 

Yes, and I have the same font. (My OS is Windows 7 on a Netbook.) In arabic I have a similar bug (the special vowels are at the wrong place), but there is a small difference. I hope my remark could be useful:
 After choosing the arabic-input-method:
- By typing "h" (for "Alef" ا) then "U" (for "Ayin" ع ) you get the 2 characters correctly
- But after typing "X" (for the "vowel" Sukun ْ) you get nothing. (In hebrew you can see QUBBUTS (but also at the wrong position - by the FIRST character.) [This is THE ONLY difference!] 
- Move the cursor to the beginning of the line (Ctrl-a) you will see the Sukun (the small circle). BUT on the Alef NOT on the Ayin.
-- If you write then anything in the SAME line the Sukun disappears! And if you change the "WINDOW" (Alt-Tab) and goes back to Emacs it disappears too. 
- Move the cursor again to the beginning of the line (Ctrl-a) you will see the Sukun (the small circle) again.
-- If you type ENTER at the end of the line (Ctrl-e) you can write what you want, you can always see the Sukun.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 22 Aug 2012 21:42:01 GMT) Full text and rfc822 format available.

Message #101 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru,
	jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 22 Aug 2012 23:40:55 +0200


>> From: Kenichi Handa <handa <at> gnu.org>
>> Cc: jasonr <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
>> Date: Tue, 21 Aug 2012 22:16:51 +0900
>>
>>> (Note that the fRTL member of items[0].a is set to TRUE.) My
>>> understanding of the advances[] array is that it gives, for each glyph
>>> in the cluster, the number of pixels to advance to the right after
>>> drawing the glyph. So the fact that it is 8 for the first (base)
>>> character and zero for the second one tells me that this grapheme
>>> cluster is supposed to be rendered in reverse order: first the Sukun,
>>> then Ayin at the same location, and then advance by 8 pixels for the
>>> next character. Is this correct?
>>
>> I think so.
> 
> Well, it turns out that the truth is slightly different. When the
> Uniscribe shaper is handed a chunk of RTL text with the fLogicalOrder
> flag set to TRUE, it prepares the glyphs in the logical order, but
> assumes that they will be laid out in reverse. In this reverse order,
> the width advance value is applied _before_ drawing the glyph, and
> positive width advance values move the pen to the _left_. I found
> this important information on some Web page, which I unfortunately can
> no longer find.
> 
> In addition, it looks like in this "reverse" mode, the X-OFFSET value
> is also interpreted in the reverse direction, so its sign must be
> flipped for glyphs in RTL grapheme clusters.
> 
> Armed with this knowledge, with the information you posted, and after
> studying the drawing code in w32term.c, I made some semi-empirical
> changes in uniscribe_shape that produce good results both with Arabic
> and with Hebrew. In a nutshell, I adjusted the X-OFFSET values for
> the width of the base-character glyph. The results are committed as
> trunk revision 109726; as I only tested the modification on a small
> sample of composed texts, please see if you can run more tests with as
> complex compositions as you can find.
> 
>>> If it is correct, then how come the glyphs shown on GNU/Linux also
>>> have non-zero value of xadvance:
>>> [0 1 1593 969 8 2 8 4 4 nil]
>>> [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]
>>
>> Emacs draws the first glyph at its base point and advance
>> the base point 8 pixels to the right (because the WIDTH of
>> the first glyph is 8). Then Emacs draw the second glyph at
>> 9 pixels left and 2 pixels up from the base point. So, the
>> second glyph is drawn above the first glyph.
> 
> I see. This was somewhat counter-intuitive (why move first and then
> correct that by negative offsets, instead of not moving until all the
> glyphs in the cluster are drawn?).
> 
>>>> For instance, in the above case, we may have to render glyphs in
>>>> this order (diacritical mark first):
>>>>
>>>> [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>>>> [0 1 1593 969 8 1 8 12 4 nil]
>>>
>>> I tried the naive patch below, but it didn't quite work. It seems
>>> like those changes somehow prevented character composition. Perhaps
>>> Handa-san could give me some guidance here.
>>
>> Did your patch produced the above GSTRING?
> 
> Yes. But I think swapping the glyphs in the cluster was not the right
> idea, because it violates the assumptions in w32font_draw, the drawing
> routine called by the font back-end. That routine expects the first
> glyph to be for the base character of the composition.
> 
>> Please see the function
>> x_draw_composite_glyph_string_foreground (in xterm.c and
>> w32term.c). It shows which component of GSTRING is used for
>> drawing (the last branch of "iff" condition).
> 
> Yes, that helped, thanks.
> 
> Steffan, can you try the latest trunk code, and see if there are any
> problems left?


 -- 

Sorry, I miss this. I don't know where I can get this trunk code (r109726?). 
At http://alpha.gnu.org/gnu/emacs/windows/?C=M;O=A
the latest version is this:
 emacs-20120813-r109584-bin-i386.zip.sig     13-Aug-2012 12:32  287

Thanks

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 23 Aug 2012 02:50:01 GMT) Full text and rfc822 format available.

Message #104 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 23 Aug 2012 05:49:10 +0300

> From: Steffan <smias <at> yandex.ru>
> Cc: jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,smias <at> yandex.ru,Kenichi Handa <handa <at> gnu.org>
> Date: Wed, 22 Aug 2012 23:40:55 +0200
> 
> > Steffan, can you try the latest trunk code, and see if there are any
> > problems left?
> 
> 
>  -- 
> 
> Sorry, I miss this. I don't know where I can get this trunk code (r109726?). 
> At http://alpha.gnu.org/gnu/emacs/windows/?C=M;O=A
> the latest version is this:
>  emacs-20120813-r109584-bin-i386.zip.sig     13-Aug-2012 12:32  287

Then please wait for a few days until a newer snapshot is available at
that place, and try that.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 23 Aug 2012 02:51:02 GMT) Full text and rfc822 format available.

Message #107 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 23 Aug 2012 05:50:02 +0300

> From: Steffan <smias <at> yandex.ru>
> Cc: jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,smias <at> yandex.ru,Eli Zaretskii <eliz <at> gnu.org>
> Date: Wed, 22 Aug 2012 21:52:07 +0200
> 
>  After choosing the arabic-input-method:
> - By typing "h" (for "Alef" ا) then "U" (for "Ayin" ع ) you get the 2 characters correctly
> - But after typing "X" (for the "vowel" Sukun ْ) you get nothing. (In hebrew you can see QUBBUTS (but also at the wrong position - by the FIRST character.) [This is THE ONLY difference!] 
> - Move the cursor to the beginning of the line (Ctrl-a) you will see the Sukun (the small circle). BUT on the Alef NOT on the Ayin.
> -- If you write then anything in the SAME line the Sukun disappears! And if you change the "WINDOW" (Alt-Tab) and goes back to Emacs it disappears too. 
> - Move the cursor again to the beginning of the line (Ctrl-a) you will see the Sukun (the small circle) again.
> -- If you type ENTER at the end of the line (Ctrl-e) you can write what you want, you can always see the Sukun.

The latest trunk works correctly with this recipe: the Sukun is
positioned at the right place and does not disappear.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 27 Aug 2012 21:12:02 GMT) Full text and rfc822 format available.

Message #110 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 27 Aug 2012 23:10:57 +0200

I've just tested it. It's different. But it doesn't work correctly.

In the arabic-input-method 

h-X-SPACE-h works fine, 
but h-X-h doesn't work, the Sukkun disappears.

(Hebrew seems to work correctly.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 29 Aug 2012 08:12:01 GMT) Full text and rfc822 format available.

Message #113 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 29 Aug 2012 17:09:49 +0900

In article <177391346101857 <at> web27f.yandex.ru>, Steffan <smias <at> yandex.ru> writes:

> I've just tested it. It's different. But it doesn't work correctly.
> In the arabic-input-method 

> h-X-SPACE-h works fine, 
> but h-X-h doesn't work, the Sukkun disappears.

I also tested with the following windows binary
(emacs-trunk-r109787-bin-w32-i386.zip) and confirmed the problem:

In article <CAH8Pv0g81jwWvQJ1xjmnU=8DU8+h+Sb6TbMP_xfHhwiqYatO-w <at> mail.gmail.com>, Dani Moncayo <dmoncayo <at> gmail.com> writes:
> I've just uploaded a w32 binary (from today's trunk):
>   https://www.dropbox.com/sh/7jr3vbv9tm1zod0/jPuvfrJAe8

I evaluated the attached funcion in *scratch* buffer and it
returned this string on Windows.

"  [0 1 1575 909 18 7 10 25 8 nil]
  [0 1 1575 760 0 7 11 25 8 [-1 1 0]]

  [0 1 1575 909 18 7 10 25 8 nil]
  [0 1 1575 760 0 7 11 25 8 [-19 1 0]]

"

The second GLYPHs in the first and second GSTRINGs are
different.  On GNU/Linux, they are the same, which should be
the correct behaviour.

---
Kenichi Handa
handa <at> gnu.org

(defun check-arabic-shaper ()
  (let (str)
    (save-excursion
      (with-temp-buffer
	(insert "\u0627\u0652\u0627\n\u0627\u0652 \u0627")
	(switch-to-buffer (current-buffer))
	(sit-for 0)
	(save-excursion
	  (describe-char 1)
	  (set-buffer "*Help*")
	  (if (search-forward "by these glyphs:\n" nil t)
	      (let ((pos (point)))
		(search-forward "\n\n" nil 'move)
		(setq str (buffer-substring-no-properties pos (point))))))
	(save-excursion
	  (describe-char 5)
	  (set-buffer "*Help*")
	  (if (search-forward "by these glyphs:\n" nil t)
	      (let ((pos (point)))
		(search-forward "\n\n" nil 'move)
		(setq str (concat str (buffer-substring-no-properties pos (point)))))))))
    str))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 29 Aug 2012 08:59:01 GMT) Full text and rfc822 format available.

Message #116 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Kenichi Handa <handa <at> gnu.org>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 29 Aug 2012 10:57:34 +0200


> In article , Steffan  writes:
> 
>> I've just tested it. It's different. But it doesn't work correctly.
>> In the arabic-input-method
>> h-X-SPACE-h works fine,
>> but h-X-h doesn't work, the Sukkun disappears.
> 
> I also tested with the following windows binary
> (emacs-trunk-r109787-bin-w32-i386.zip) and confirmed the problem:


I tested this one:  emacs-20120827-r109788-bin-i386.zip         27-Aug-2012 02:55   58M  
from 
http://alpha.gnu.org/gnu/emacs/windows/?C=M;O=A

> In article , Dani Moncayo writes:
> 
>> I've just uploaded a w32 binary (from today's trunk):
>> https://www.dropbox.com/sh/7jr3vbv9tm1zod0/jPuvfrJAe8
> 
> I evaluated the attached funcion in *scratch* buffer and it
> returned this string on Windows.
> 
> " [0 1 1575 909 18 7 10 25 8 nil]
> [0 1 1575 760 0 7 11 25 8 [-1 1 0]]
> 
> [0 1 1575 909 18 7 10 25 8 nil]
> [0 1 1575 760 0 7 11 25 8 [-19 1 0]]
> 
> "
> 
> The second GLYPHs in the first and second GSTRINGs are
> different. On GNU/Linux, they are the same, which should be
> the correct behaviour.
> 
> ---
> Kenichi Handa
> handa <at> gnu.org
> 
> (defun check-arabic-shaper ()
> (let (str)
> (save-excursion
> (with-temp-buffer
> (insert "\u0627\u0652\u0627\n\u0627\u0652 \u0627")
> (switch-to-buffer (current-buffer))
> (sit-for 0)
> (save-excursion
> (describe-char 1)
> (set-buffer "*Help*")
> (if (search-forward "by these glyphs:\n" nil t)
> (let ((pos (point)))
> (search-forward "\n\n" nil 'move)
> (setq str (buffer-substring-no-properties pos (point))))))
> (save-excursion
> (describe-char 5)
> (set-buffer "*Help*")
> (if (search-forward "by these glyphs:\n" nil t)
> (let ((pos (point)))
> (search-forward "\n\n" nil 'move)
> (setq str (concat str (buffer-substring-no-properties pos (point)))))))))
> str))

I get in the scratch-buffer this result:

"  [0 1 1575 909 8 3 5 12 4 nil]
  [0 1 1575 760 0 3 6 12 4 [0 1 0]]

  [0 1 1575 909 8 3 5 12 4 nil]
  [0 1 1575 760 0 3 6 12 4 [-8 1 0]]

"

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sat, 01 Sep 2012 14:01:01 GMT) Full text and rfc822 format available.

Message #119 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 01 Sep 2012 16:59:14 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: eliz <at> gnu.org,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,smias <at> yandex.ru
> Date: Wed, 29 Aug 2012 17:09:49 +0900
> 
> In article <177391346101857 <at> web27f.yandex.ru>, Steffan <smias <at> yandex.ru> writes:
> 
> > I've just tested it. It's different. But it doesn't work correctly.
> > In the arabic-input-method 
> 
> > h-X-SPACE-h works fine, 
> > but h-X-h doesn't work, the Sukkun disappears.
> 
> I also tested with the following windows binary
> (emacs-trunk-r109787-bin-w32-i386.zip) and confirmed the problem:
> 
> In article <CAH8Pv0g81jwWvQJ1xjmnU=8DU8+h+Sb6TbMP_xfHhwiqYatO-w <at> mail.gmail.com>, Dani Moncayo <dmoncayo <at> gmail.com> writes:
> > I've just uploaded a w32 binary (from today's trunk):
> >   https://www.dropbox.com/sh/7jr3vbv9tm1zod0/jPuvfrJAe8
> 
> I evaluated the attached funcion in *scratch* buffer and it
> returned this string on Windows.
> 
> "  [0 1 1575 909 18 7 10 25 8 nil]
>   [0 1 1575 760 0 7 11 25 8 [-1 1 0]]
> 
>   [0 1 1575 909 18 7 10 25 8 nil]
>   [0 1 1575 760 0 7 11 25 8 [-19 1 0]]
> 
> "
> 
> The second GLYPHs in the first and second GSTRINGs are
> different.  On GNU/Linux, they are the same, which should be
> the correct behaviour.

The problem was that the code I committed didn't expect to handle more
than a single grapheme cluster.  I now fixed that code for the case of
several grapheme clusters that are handed to the shaper all at once.
With the modified code (trunk revision 109842), both Steffan's recipe
and the check-arabic-shaper function work correctly.  Please test.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sat, 01 Sep 2012 14:10:02 GMT) Full text and rfc822 format available.

Message #122 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 01 Sep 2012 17:06:23 +0300

> From: Steffan <smias <at> yandex.ru>
> Cc: eliz <at> gnu.org,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,smias <at> yandex.ru
> Date: Wed, 29 Aug 2012 10:57:34 +0200
> 
> I get in the scratch-buffer this result:
> 
> "  [0 1 1575 909 8 3 5 12 4 nil]
>   [0 1 1575 760 0 3 6 12 4 [0 1 0]]
> 
>   [0 1 1575 909 8 3 5 12 4 nil]
>   [0 1 1575 760 0 3 6 12 4 [-8 1 0]]
> 
> "

Right.  On Windows XP, I got a slightly different output (1-pixel
difference):

  "  [0 1 1575 909 8 4 5 12 4 nil]
    [0 1 1575 760 0 3 6 12 4 [0 0 0]]

    [0 1 1575 909 8 4 5 12 4 nil]
    [0 1 1575 760 0 3 6 12 4 [-8 0 0]]

  "

I guess the difference is due to the changes in Uniscribe on Windows 7.

After the change, I get this on XP:

  "  [0 1 1575 909 8 4 5 12 4 nil]
    [0 1 1575 760 0 3 6 12 4 [-8 0 0]]

    [0 1 1575 909 8 4 5 12 4 nil]
    [0 1 1575 760 0 3 6 12 4 [-8 0 0]]

  "

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 03 Sep 2012 13:58:02 GMT) Full text and rfc822 format available.

Message #125 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 03 Sep 2012 22:55:36 +0900

In article <83sjb1g7yl.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> The problem was that the code I committed didn't expect to handle more
> than a single grapheme cluster.  I now fixed that code for the case of
> several grapheme clusters that are handed to the shaper all at once.
> With the modified code (trunk revision 109842), both Steffan's recipe
> and the check-arabic-shaper function work correctly.  Please test.

I tested with this version:

  http://alpha.gnu.org/gnu/emacs/windows/emacs-20120903-r109861-bin-i386.zip

and confirmed that the problem was fixed.  The Arabic line
of HELLO file is also displayed correctly.  By the way, it
seems that "arial" font has better OTF GPOS feautre for
Arabic than the default font "courier new".

Try to evaluate this on Windows:
  (set-fontset-font t 'arabic (font-spec :family "arial" :size 30))
and see the position of upper vowels.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 03 Sep 2012 15:34:01 GMT) Full text and rfc822 format available.

Message #128 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru,
	jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 03 Sep 2012 17:31:39 +0200


>> From: Kenichi Handa <handa <at> gnu.org>
>> Cc: eliz <at> gnu.org,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,smias <at> yandex.ru
>> Date: Wed, 29 Aug 2012 17:09:49 +0900
>>
>> In article <177391346101857 <at> web27f.yandex.ru>, Steffan <smias <at> yandex.ru> writes:
>>
>>> I've just tested it. It's different. But it doesn't work correctly.
>>> In the arabic-input-method
>>> h-X-SPACE-h works fine,
>>> but h-X-h doesn't work, the Sukkun disappears.
>>
>> I also tested with the following windows binary
>> (emacs-trunk-r109787-bin-w32-i386.zip) and confirmed the problem:
>>
>> In article <CAH8Pv0g81jwWvQJ1xjmnU=8DU8+h+Sb6TbMP_xfHhwiqYatO-w <at> mail.gmail.com>, Dani Moncayo <dmoncayo <at> gmail.com> writes:
>>
>>> I've just uploaded a w32 binary (from today's trunk):
>>> https://www.dropbox.com/sh/7jr3vbv9tm1zod0/jPuvfrJAe8
>>
>> I evaluated the attached funcion in *scratch* buffer and it
>> returned this string on Windows.
>>
>> " [0 1 1575 909 18 7 10 25 8 nil]
>> [0 1 1575 760 0 7 11 25 8 [-1 1 0]]
>>
>> [0 1 1575 909 18 7 10 25 8 nil]
>> [0 1 1575 760 0 7 11 25 8 [-19 1 0]]
>>
>> "
>>
>> The second GLYPHs in the first and second GSTRINGs are
>> different. On GNU/Linux, they are the same, which should be
>> the correct behaviour.
> 
> The problem was that the code I committed didn't expect to handle more
> than a single grapheme cluster. I now fixed that code for the case of
> several grapheme clusters that are handed to the shaper all at once.
> With the modified code (trunk revision 109842), both Steffan's recipe
> and the check-arabic-shaper function work correctly. Please test.
> 
> Thanks.


 -- 

Thanks, this bug is now fixed. But there is something wrong with the two diacritics (short vowels): ARABIC KASRA and ARABIC KASRATAN. They should appear UNDER the letters, not IN or OVER them.

Try a-A [Sheen-Kasra] or a-S [Sheen-Kasratan] or d-S [Ya-Kasratan]. 
But h-S or m-S has the correct form. (?)

I tested this trunk:
 emacs-20120903-r109861-bin-i386.zip         03-Sep-2012 03:07   58M  

The arabic letters don't have a constant form. This seems to be the problem (?): Try f-S and compare with f-S-f.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 03 Sep 2012 15:56:02 GMT) Full text and rfc822 format available.

Message #131 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 03 Sep 2012 18:53:34 +0300

[Message part 1 (text/plain, inline)]

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: smias <at> yandex.ru, jasonr <at> gnu.org, 11860 <at> debbugs.gnu.org
> Date: Mon, 03 Sep 2012 22:55:36 +0900
> 
> In article <83sjb1g7yl.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > The problem was that the code I committed didn't expect to handle more
> > than a single grapheme cluster.  I now fixed that code for the case of
> > several grapheme clusters that are handed to the shaper all at once.
> > With the modified code (trunk revision 109842), both Steffan's recipe
> > and the check-arabic-shaper function work correctly.  Please test.
> 
> I tested with this version:
> 
>   http://alpha.gnu.org/gnu/emacs/windows/emacs-20120903-r109861-bin-i386.zip
> 
> and confirmed that the problem was fixed.  The Arabic line
> of HELLO file is also displayed correctly.

Thanks for testing.  I will wait for Steffan's confirmation before
closing the bug report.

> By the way, it seems that "arial" font has better OTF GPOS feautre
> for Arabic than the default font "courier new".
> 
> Try to evaluate this on Windows:
>   (set-fontset-font t 'arabic (font-spec :family "arial" :size 30))
> and see the position of upper vowels.

It indeed looks nicer, but its base line is too high, IMHO, and thus
the Arabic text looks awkward wrt surrounding Latin text, see the
attached snapshot.

[HELLO-Arial.PNG (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 03 Sep 2012 16:26:02 GMT) Full text and rfc822 format available.

Message #134 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru,
	jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 03 Sep 2012 18:24:13 +0200

> It indeed looks nicer, but its base line is too high, IMHO, and thus
> the Arabic text looks awkward wrt surrounding Latin text, see the
> attached snapshot.

Yes, I think the same size for latin and arabic text is better. 

I hope the solving of the problem with KASRA and KASRATAN is not too difficult.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 03 Sep 2012 16:30:02 GMT) Full text and rfc822 format available.

Message #137 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 03 Sep 2012 19:28:14 +0300

> From: Steffan <smias <at> yandex.ru>
> Cc: smias <at> yandex.ru,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,Kenichi Handa <handa <at> gnu.org>
> Date: Mon, 03 Sep 2012 17:31:39 +0200
> 
> Thanks, this bug is now fixed. But there is something wrong with the two diacritics (short vowels): ARABIC KASRA and ARABIC KASRATAN. They should appear UNDER the letters, not IN or OVER them.
> 
> Try a-A [Sheen-Kasra] or a-S [Sheen-Kasratan] or d-S [Ya-Kasratan]. 

Seems to work fine on XP.  I will have to try on Windows 7 later.

> The arabic letters don't have a constant form. This seems to be the problem (?): Try f-S and compare with f-S-f.

I don't understand what you are saying here.  Is something wrong with
how f-S or f-S-f are displayed?  If so, what exactly is wrong?  If
that's not what you meant, then what "seems to be the problem"?

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 03 Sep 2012 17:51:02 GMT) Full text and rfc822 format available.

Message #140 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 03 Sep 2012 19:49:04 +0200

[Message part 1 (text/plain, inline)]


>> From: Steffan <smias <at> yandex.ru>
>> Cc: smias <at> yandex.ru,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,Kenichi Handa <handa <at> gnu.org>
>> Date: Mon, 03 Sep 2012 17:31:39 +0200
>>
>> Thanks, this bug is now fixed. But there is something wrong with the two diacritics (short vowels): ARABIC KASRA and ARABIC KASRATAN. They should appear UNDER the letters, not IN or OVER them.
>>
>> Try a-A [Sheen-Kasra] or a-S [Sheen-Kasratan] or d-S [Ya-Kasratan].
> 
> Seems to work fine on XP. I will have to try on Windows 7 later.
> 
>> The arabic letters don't have a constant form. This seems to be the problem (?): Try f-S and compare with f-S-f.
> 
> I don't understand what you are saying here. Is something wrong with
> how f-S or f-S-f are displayed? If so, what exactly is wrong? If
> that's not what you meant, then what "seems to be the problem"?
> 
> Thanks.

(See the screenshot)

Well, comparin u-S with u-S-u is a better example. (S =ARABIC AIN)
- By u-S you see the KASRATAN in the AIN which is wrong.
- But after typing the second u, you get it correctly. Beaucse AIN (like most arabic letters) changes its form (not like hebrew) if you add another letter: it gets smaller, and so the Kasratan get the right position.
Emacs in Linux handels it correctly: The Kasratan CHANGES it's position after typing the second AIN (u-S-u). 



 --

[diacritics.jpg (image/jpeg, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 04 Sep 2012 09:04:02 GMT) Full text and rfc822 format available.

Message #143 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 04 Sep 2012 18:03:26 +0900

In article <83wr0bdrwh.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > By the way, it seems that "arial" font has better OTF GPOS feautre
> > for Arabic than the default font "courier new".
> > 
> > Try to evaluate this on Windows:
> >   (set-fontset-font t 'arabic (font-spec :family "arial" :size 30))
> > and see the position of upper vowels.

> It indeed looks nicer, but its base line is too high, IMHO, and thus
> the Arabic text looks awkward wrt surrounding Latin text, see the
> attached snapshot.

That's perhaps because of ":size 30".  I specified it to
highlight the effect of GPOS.  If you don't specify it, it
seems that the result looks better, though, of course, I'm
not an expert of typesetting Arabic-English mixed test.

BTW, typesetting multiple scripts with different fonts is
a difficult task.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 04 Sep 2012 17:19:01 GMT) Full text and rfc822 format available.

Message #146 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Steffan <smias <at> yandex.ru>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 04 Sep 2012 20:18:19 +0300

> From: Steffan <smias <at> yandex.ru>
> Cc: smias <at> yandex.ru,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,Kenichi Handa <handa <at> gnu.org>
> Date: Mon, 03 Sep 2012 17:31:39 +0200
> 
> Thanks, this bug is now fixed. But there is something wrong with the two diacritics (short vowels): ARABIC KASRA and ARABIC KASRATAN. They should appear UNDER the letters, not IN or OVER them.
> 
> Try a-A [Sheen-Kasra] or a-S [Sheen-Kasratan] or d-S [Ya-Kasratan]. 
> But h-S or m-S has the correct form. (?)

This happened because w32uniscribe.c didn't reverse the sign of the
y-offsets returned by the Uniscribe shaper.  This reversal is
necessary because the Y axes in font definition coordinates and in
Emacs screen coordinates point in opposite directions.

I see that ftfont.c consistently reverses the sign of y-offsets,
probably for the same reason.

Fixed in trunk revision 109876.  Please test.

P.S. This bug affected all complex scripts, not just RTL or Arabic.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 06 Sep 2012 02:10:02 GMT) Full text and rfc822 format available.

Message #149 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Steffan <smias <at> yandex.ru>
Cc: Kenichi Handa <handa <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
	11860 <at> debbugs.gnu.org
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 06 Sep 2012 11:09:20 +0900

[Message part 1 (text/plain, inline)]

>>>>> On Mon, 03 Sep 2012 19:49:04 +0200, Steffan <smias <at> yandex.ru> said:

> (See the screenshot)

> Well, comparin u-S with u-S-u is a better example. (S =ARABIC AIN)
> - By u-S you see the KASRATAN in the AIN which is wrong.
> - But after typing the second u, you get it correctly. Beaucse AIN (like most arabic letters) changes its form (not like hebrew) if you add another letter: it gets smaller, and so the Kasratan get the right position.
> Emacs in Linux handels it correctly: The Kasratan CHANGES it's position after typing the second AIN (u-S-u). 

Some of the examples don't look right with X11 on OS X to me, if I use
Arial 30pt.  See the screenshot with X11 (first) and with the Mac port
(second, with the patch in (*)).  Which font did you use when you
tried them on GNU/Linux?

(*): http://lists.gnu.org/archive/html/emacs-devel/2012-09/msg00157.html

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp

[x11.png (image/png, inline)]

[mac.png (image/png, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 06 Sep 2012 08:53:01 GMT) Full text and rfc822 format available.

Message #152 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Cc: Kenichi Handa <handa <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
	11860 <at> debbugs.gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 06 Sep 2012 10:52:31 +0200

[Message part 1 (text/plain, inline)]

>>>>>> On Mon, 03 Sep 2012 19:49:04 +0200, Steffan <smias <at> yandex.ru> said:
>>
>> (See the screenshot)
>> Well, comparin u-S with u-S-u is a better example. (S =ARABIC AIN)
>> - By u-S you see the KASRATAN in the AIN which is wrong.
>> - But after typing the second u, you get it correctly. Beaucse AIN (like most arabic letters) changes its form (not like hebrew) if you add another letter: it gets smaller, and so the Kasratan get the right position.
>> Emacs in Linux handels it correctly: The Kasratan CHANGES it's position after typing the second AIN (u-S-u).
>
> Some of the examples don't look right with X11 on OS X to me, if I use
> Arial 30pt. See the screenshot with X11 (first) and with the Mac port
> (second, with the patch in (*)). Which font did you use when you
> tried them on GNU/Linux?
>
> (*): http://lists.gnu.org/archive/html/emacs-devel/2012-09/msg00157.html
>
> YAMAMOTO Mitsuharu
> mituharu <at> math.s.chiba-u.ac.jp

--

There are also in linux problems with some fonts for arabic, for example: KacstNaskh have only one form for every arabic letter! But it works with the many fonts as Arial, Tholoth, DejaVu and Metal, which my linux machine uses by default.

See the screenshots.

[arial2.png (image/png, attachment)]

[dejavu.png (image/png, attachment)]

[metal.png (image/png, attachment)]

[tholoth.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 06 Sep 2012 09:58:01 GMT) Full text and rfc822 format available.

Message #155 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Steffan <smias <at> yandex.ru>
Cc: Kenichi Handa <handa <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
	11860 <at> debbugs.gnu.org, YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 06 Sep 2012 18:56:47 +0900

[Message part 1 (text/plain, inline)]

>>>>> On Thu, 06 Sep 2012 10:52:31 +0200, Steffan <smias <at> yandex.ru> said:

>> Some of the examples don't look right with X11 on OS X to me, if I
>> use Arial 30pt. See the screenshot with X11 (first) and with the
>> Mac port (second, with the patch in (*)). Which font did you use
>> when you tried them on GNU/Linux?
>> 
>> (*):
>> http://lists.gnu.org/archive/html/emacs-devel/2012-09/msg00157.html


> There are also in linux problems with some fonts for arabic, for
> example: KacstNaskh have only one form for every arabic letter! But
> it works with the many fonts as Arial, Tholoth, DejaVu and Metal,
> which my linux machine uses by default.

Thanks.  I also tried myself with Ubuntu 12.04 by installing Arial
using the ttf-mscorefonts-installer package.  It seems to install a
different version of the Arial font (2.82) than what's bundled with OS
X 10.8 (5.01.2x).  The result is also different from what I showed
before on OS X, but still doesn't look right unlike yours (see the
attached screenshot).

I used libotf 0.9.12, m17n-db 1.6.3, and m17n-lib 1.6.3 on both
platforms.

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp

[arial-ubuntu1204.png (image/png, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 06 Sep 2012 10:48:02 GMT) Full text and rfc822 format available.

Message #158 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, mituharu <at> math.s.chiba-u.ac.jp,
	smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 06 Sep 2012 13:47:54 +0300

> Date: Thu, 06 Sep 2012 18:56:47 +0900
> From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
> Cc: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>,
> 	Eli Zaretskii <eliz <at> gnu.org>,
> 	Kenichi Handa <handa <at> gnu.org>,
> 	11860 <at> debbugs.gnu.org
> 
> > There are also in linux problems with some fonts for arabic, for
> > example: KacstNaskh have only one form for every arabic letter! But
> > it works with the many fonts as Arial, Tholoth, DejaVu and Metal,
> > which my linux machine uses by default.
> 
> Thanks.  I also tried myself with Ubuntu 12.04 by installing Arial
> using the ttf-mscorefonts-installer package.  It seems to install a
> different version of the Arial font (2.82) than what's bundled with OS
> X 10.8 (5.01.2x).  The result is also different from what I showed
> before on OS X, but still doesn't look right unlike yours (see the
> attached screenshot).

Note that the Y-OFF value is zero in your snapshot, while it is 9 in
Steffan's one.  That's the immediate reason for the different display,
I think.  The question is, why that difference happens.  Perhaps
Handa-san could help us understand that.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 06 Sep 2012 14:53:02 GMT) Full text and rfc822 format available.

Message #161 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Cc: Kenichi Handa <handa <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
	11860 <at> debbugs.gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 06 Sep 2012 16:52:20 +0200


>>>>>> On Thu, 06 Sep 2012 10:52:31 +0200, Steffan <smias <at> yandex.ru> said:
>>>
>>> Some of the examples don't look right with X11 on OS X to me, if I
>>> use Arial 30pt. See the screenshot with X11 (first) and with the
>>> Mac port (second, with the patch in (*)). Which font did you use
>>> when you tried them on GNU/Linux?
>>>
>>> (*):
>>> http://lists.gnu.org/archive/html/emacs-devel/2012-09/msg00157.html
>>
>> There are also in linux problems with some fonts for arabic, for
>> example: KacstNaskh have only one form for every arabic letter! But
>> it works with the many fonts as Arial, Tholoth, DejaVu and Metal,
>> which my linux machine uses by default.
> 
> Thanks. I also tried myself with Ubuntu 12.04 by installing Arial
> using the ttf-mscorefonts-installer package. It seems to install a
> different version of the Arial font (2.82) than what's bundled with OS
> X 10.8 (5.01.2x). The result is also different from what I showed
> before on OS X, but still doesn't look right unlike yours (see the
> attached screenshot).
> 
> I used libotf 0.9.12, m17n-db 1.6.3, and m17n-lib 1.6.3 on both
> platforms.
> 
> YAMAMOTO Mitsuharu
> mituharu <at> math.s.chiba-u.ac.jp

Alle my programs are older.
-  My emacs version is 24.1.1, GTK+ Version 2,20.1 and the Ubuntu version is 10 10.04 LTS Lucid Lynx. And I have libotf 0.9.10, m17n 1.5.5 
-  I installed every program mentioned in the INSTALL-file of emacs -- and a lot of other things I don't really remember like emacs-intl-fonts, xfonts-intl-arabic...

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 09 Sep 2012 04:07:02 GMT) Full text and rfc822 format available.

Message #164 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 09 Sep 2012 13:06:20 +0900

>>>>> On Sun, 19 Aug 2012 13:34:36 +0900, YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp> said:

>>>>> On Sat, 18 Aug 2012 11:45:27 +0900, Kenichi Handa <handa <at> gnu.org> said:
>> If this problem happens only for bidi scripts, one
>> possibility is that Emacs's rendering engine (xdisp.c)
>> expects glyphs in a glyph-string are rendered in that order
>> from left to right, but the returned glyph-string on Windows
>> should be rendered in reverse order.  For instance, in the
>> above case, we may have to render glyphs in this order
>> (diacritical mark first):

>> [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>> [0 1 1593 969 8 1 8 12 4 nil]

> The font backend driver on the Mac port is supposed to support
> right-to-left shaping (including for non-BMP chars, though I don't
> have a good example), and it gives the following result (diacritical
> mark comes first) for Courier New 13pt:

>   mac-ct:-*-Courier New-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
> by these glyphs:
>   [0 1 1618 760 8 0 2 11 -8 [-1 2 1]]
>   [0 1 1593 969 8 0 6 5 4 [-1 0 8]]

The above result was not correct in a couple of points.  First, the
font backend driver for the Mac port had a bug (*1).  Second, OS X
10.7 and 10.8 seem to have a bug that they report incorrect lbearing
and rbearing values for Courier New (*2).  In particlar, the lbearing
value is always reported as 0, as in the above result.

*1: http://lists.gnu.org/archive/html/emacs-devel/2012-09/msg00157.html
*2: http://openradar.appspot.com/10377021

Mac OS X 10.6 does not have the second issue, and with the patch in
(*1), it reports the following result:

  mac-ct:-*-Courier New-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 1618 760 8 3 5 11 -8 [-1 2 0]]
  [0 1 1593 969 8 1 8 5 4 nil]

> In the above example, the grapheme cluster consists of glyphs having
> non-nil adjustments (the last element of each vector).  In the
> function Ffont_shape_gstring, there is some code that merges grapheme
> clusters generated by a font backend driver so each of them starts
> with a glyph having non-nil adjustment (except the first grapheme
> cluster of the gstring).  I think this is not correct especially for
> right-to-left text, and I disabled that part in the Mac port.  Could
> you give an example if you think this part is necessary?

The first glyph in the above result still has non-nil adjustments.
Another example is the Arabic "u-S-u" case for Arial 30pt (*3).  It
consists of the following two grapheme clusters (from right to left):

  [0 1 1613 755 0 1 7 2 4 [0 0 -3]]
  [0 1 0 971 16 -1 15 15 -4 nil]

  [2 2 0 970 14 1 15 13 7 [0 0 16]]

*3: http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-09/msg00178.html

As you explained, the grapheme clusters are in logical order, and
glyphs in each grapheme cluster are in drawing order.  So simply
merging grapheme clusters as in the code in Ffont_shape_gstring does
not seem to be correct in the case of right-to-left text (what's drawn
later comes earlier in a merged grapheme cluster).

IMO, dividing glyphs into grapheme clusters is font backed driver's
task, and I don't understand why Ffont_shape_gstring merges the
grapheme clusters for some cases.  Could you explain?

				     YAMAMOTO Mitsuharu
				mituharu <at> math.s.chiba-u.ac.jp

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 10 Sep 2012 16:15:01 GMT) Full text and rfc822 format available.

Message #167 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Steffan <smias <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, jasonr <at> gnu.org
Subject: Re:bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 10 Sep 2012 18:13:55 +0200


>> From: Steffan <smias <at> yandex.ru>
>> Cc: smias <at> yandex.ru,jasonr <at> gnu.org,11860 <at> debbugs.gnu.org,Kenichi Handa <handa <at> gnu.org>
>> Date: Mon, 03 Sep 2012 17:31:39 +0200
>>
>> Thanks, this bug is now fixed. But there is something wrong with the two diacritics (short vowels): ARABIC KASRA and ARABIC KASRATAN. They should appear UNDER the letters, not IN or OVER them.
>>
>> Try a-A [Sheen-Kasra] or a-S [Sheen-Kasratan] or d-S [Ya-Kasratan].
>> But h-S or m-S has the correct form. (?)
> 
> This happened because w32uniscribe.c didn't reverse the sign of the
> y-offsets returned by the Uniscribe shaper. This reversal is
> necessary because the Y axes in font definition coordinates and in
> Emacs screen coordinates point in opposite directions.
> 
> I see that ftfont.c consistently reverses the sign of y-offsets,
> probably for the same reason.
> 
> Fixed in trunk revision 109876. Please test.
> 
> P.S. This bug affected all complex scripts, not just RTL or Arabic.


 -- 

Is there another site to get the trunk?

http://alpha.gnu.org/gnu/emacs/windows/?C=M;O=A

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 11 Sep 2012 14:52:02 GMT) Full text and rfc822 format available.

Message #170 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 11 Sep 2012 23:49:40 +0900

In article <wlbohflu0z.wl%mituharu <at> math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp> writes:

> As you explained, the grapheme clusters are in logical order, and
> glyphs in each grapheme cluster are in drawing order.  So simply
> merging grapheme clusters as in the code in Ffont_shape_gstring does
> not seem to be correct in the case of right-to-left text (what's drawn
> later comes earlier in a merged grapheme cluster).

Sure.

> IMO, dividing glyphs into grapheme clusters is font backed driver's
> task, and I don't understand why Ffont_shape_gstring merges the
> grapheme clusters for some cases.  Could you explain?

When I designed it, I consider such a situation that
grapheme clusters returned by a font-driver is so fine for
Emacs' display engine that Ffont_shape_gstring must combine
some of them into one grapheme cluster.  But, I agree that
it's much cleaner to make a font-driver to consider such a
thing.

I'll try to fix Ffont_shape_gstring, and check whether or
not it breaks rendersing of some scripts.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 11 Sep 2012 17:50:02 GMT) Full text and rfc822 format available.

Message #173 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 11 Sep 2012 20:48:36 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Date: Tue, 11 Sep 2012 23:49:40 +0900
> Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> 
> > IMO, dividing glyphs into grapheme clusters is font backed driver's
> > task, and I don't understand why Ffont_shape_gstring merges the
> > grapheme clusters for some cases.  Could you explain?
> 
> When I designed it, I consider such a situation that
> grapheme clusters returned by a font-driver is so fine for
> Emacs' display engine that Ffont_shape_gstring must combine
> some of them into one grapheme cluster.  But, I agree that
> it's much cleaner to make a font-driver to consider such a
> thing.

AFAICS, all Ffont_shape_gstring does is modify the FROM and TO
components of the glyph-string.  For this to have any effect on the
screen, these components need to be used by the drawing routines.  But
the code in xterm.c and w32term.c that draws the composite characters
(x_draw_composite_glyph_string_foreground) doesn't seem to use these
components.  At least for w32term.c, we just draw the glyphs returned
by the shaper, one by one.

What am I missing?  Where does this "merge" of glyphs into a single
grapheme cluster come into play when displaying the glyphs?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 12 Sep 2012 13:17:01 GMT) Full text and rfc822 format available.

Message #176 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 12 Sep 2012 22:14:50 +0900

In article <83k3w0wivf.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> AFAICS, all Ffont_shape_gstring does is modify the FROM and TO
> components of the glyph-string.  For this to have any effect on the
> screen, these components need to be used by the drawing routines.  But
> the code in xterm.c and w32term.c that draws the composite characters
> (x_draw_composite_glyph_string_foreground) doesn't seem to use these
> components.  At least for w32term.c, we just draw the glyphs returned
> by the shaper, one by one.

Each grapheme cluster is a "display element" in the sense of
get_next_display_element.  get_next_display_element calls
next_element_from_composition which calls
composition_update_it which updates it->cmp_it.from and
it->cmp_it.to from FROM and TO elements for LGLYPH.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Wed, 12 Sep 2012 16:36:02 GMT) Full text and rfc822 format available.

Message #179 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Wed, 12 Sep 2012 19:34:48 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: mituharu <at> math.s.chiba-u.ac.jp, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> Date: Wed, 12 Sep 2012 22:14:50 +0900
> 
> In article <83k3w0wivf.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > AFAICS, all Ffont_shape_gstring does is modify the FROM and TO
> > components of the glyph-string.  For this to have any effect on the
> > screen, these components need to be used by the drawing routines.  But
> > the code in xterm.c and w32term.c that draws the composite characters
> > (x_draw_composite_glyph_string_foreground) doesn't seem to use these
> > components.  At least for w32term.c, we just draw the glyphs returned
> > by the shaper, one by one.
> 
> Each grapheme cluster is a "display element" in the sense of
> get_next_display_element.  get_next_display_element calls
> next_element_from_composition which calls
> composition_update_it which updates it->cmp_it.from and
> it->cmp_it.to from FROM and TO elements for LGLYPH.

Yes, but wasn't this discussion about the effects of
Ffont_shape_gstring on drawing the resulting glyphs?
get_next_display_element has no bearing on that.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 13 Sep 2012 06:10:01 GMT) Full text and rfc822 format available.

Message #182 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 13 Sep 2012 15:07:14 +0900

In article <83fw6nw66v.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> Yes, but wasn't this discussion about the effects of
> Ffont_shape_gstring on drawing the resulting glyphs?
> get_next_display_element has no bearing on that.

When a font driver returns GSTRING "abcdef", how to segment
it into clusters affect displaying especially in R2L text.
If "abcdef" is segmented as "ab", "cd", "ef", it is
displayed as "efcdab", but if it is segmented as "abc",
"def", it is displaed as "defabc".  In addition cursor
movement is also affected by how GSTRING is segmented.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 13 Sep 2012 17:02:02 GMT) Full text and rfc822 format available.

Message #185 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Thu, 13 Sep 2012 20:00:17 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Cc: mituharu <at> math.s.chiba-u.ac.jp, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> Date: Thu, 13 Sep 2012 15:07:14 +0900
> 
> In article <83fw6nw66v.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Yes, but wasn't this discussion about the effects of
> > Ffont_shape_gstring on drawing the resulting glyphs?
> > get_next_display_element has no bearing on that.
> 
> When a font driver returns GSTRING "abcdef", how to segment
> it into clusters affect displaying especially in R2L text.
> If "abcdef" is segmented as "ab", "cd", "ef", it is
> displayed as "efcdab", but if it is segmented as "abc",
> "def", it is displaed as "defabc".  In addition cursor
> movement is also affected by how GSTRING is segmented.

OK, but in this case we are talking about diacriticals, which are
always drawn in the same character cell as the base character.  IOW,
the pen does not advance until the entire gstring is drawn.  In that
case, whatever Ffont_shape_gstring does will not affect the result on
the screen, would it?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Thu, 13 Sep 2012 23:29:01 GMT) Full text and rfc822 format available.

Message #188 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Fri, 14 Sep 2012 08:26:29 +0900

In article <83vcfhvowu.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> OK, but in this case we are talking about diacriticals, which are
> always drawn in the same character cell as the base character.  IOW,
> the pen does not advance until the entire gstring is drawn.  In that
> case, whatever Ffont_shape_gstring does will not affect the result on
> the screen, would it?

If a font driver decided to adjust the drawing poistion of a
base character, Ffont_shape_gstring wrongly combines that
character with the previous cluster, which results in wrong
display position of that base character.

For instance, provided upcases are base characters and
downcases are diacriticals, an RTL text "AaBbCc" should be
displayed as "CcBbAa", but if Ffont_shape_gstring wrongly
segment it as "AaBb" and "Cc", it is displayed as "CcAaBb.

---
Kenichi Handa
handa <at> gnu.org

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 16 Sep 2012 12:06:01 GMT) Full text and rfc822 format available.

Message #191 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru, mituharu <at> math.s.chiba-u.ac.jp
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 16 Sep 2012 21:03:25 +0900

In article <87bohcty0r.fsf <at> gnu.org>, Kenichi Handa <handa <at> gnu.org> writes:

> I'll try to fix Ffont_shape_gstring, and check whether or
> not it breaks rendersing of some scripts.

I've just installed this change.  With this change, I tried
to render several scripts that require CTL (complex text
layout) while setting a breakpoint at this line:
  shaper_error:
    return Qnil;
but, Emacs has never reach that line on GNU/Linux system
(i.e. with m17n-lib and libotf).

---
Kenichi Handa
handa <at> gnu.org

=== modified file 'src/font.c'
--- src/font.c	2012-09-15 07:06:56 +0000
+++ src/font.c	2012-09-16 11:47:45 +0000
@@ -4295,7 +4295,10 @@
 header of the glyph-string.
 
 If the shaping was successful, the value is GSTRING itself or a newly
-created glyph-string.  Otherwise, the value is nil.  */)
+created glyph-string.  Otherwise, the value is nil.
+
+See the documentation of `composition-get-gstring' for the format of
+GSTRING.  */)
   (Lisp_Object gstring)
 {
   struct font *font;
@@ -4326,44 +4329,45 @@
   if (XINT (n) < LGSTRING_GLYPH_LEN (gstring))
     LGSTRING_SET_GLYPH (gstring, XINT (n), Qnil);
 
+  /* Check FROM_IDX and TO_IDX of each GLYPH in GSTRING to assure that
+     GLYPHS covers all characters in GSTRING.  More formally, provided
+     that NCHARS is the number of characters in GSTRING, N is the
+     number of glyphs, and GLYPHS[i] is the ith glyph, FROM_IDX and
+     TO_IDX of each glyph must satisfy these conditions:
+
+       GLYPHS[0].FROM_IDX == 0
+       GLYPHS[i].FROM_IDX <= GLYPHS[i].TO_IDX
+       if (GLYPHS[i].FROM_IDX == GLYPHS[i-1].FROM_IDX)
+         ;; GLYPHS[i] and GLYPHS[i-1] belongs to the same grapheme cluster
+         GLYPHS[i].TO_IDX == GLYPHS[i-1].TO_IDX
+       else
+         ;; Be sure to cover all characters.
+         GLYPHS[i].FROM_IDX == GLYPHS[i-1].TO_IDX + 1
+       GLYPHS[N-1].TO_IDX == NCHARS - 1 */
   glyph = LGSTRING_GLYPH (gstring, 0);
   from = LGLYPH_FROM (glyph);
   to = LGLYPH_TO (glyph);
-  for (i = 1, j = 0; i < LGSTRING_GLYPH_LEN (gstring); i++)
+  if (from != 0 || to < from)
+    goto shaper_error;
+  for (i = 1; i < LGSTRING_GLYPH_LEN (gstring); i++)
     {
-      Lisp_Object this = LGSTRING_GLYPH (gstring, i);
-
-      if (NILP (this))
+      glyph = LGSTRING_GLYPH (gstring, i);
+      if (NILP (glyph))
 	break;
-      if (NILP (LGLYPH_ADJUSTMENT (this)))
-	{
-	  if (j < i - 1)
-	    for (; j < i; j++)
-	      {
-		glyph = LGSTRING_GLYPH (gstring, j);
-		LGLYPH_SET_FROM (glyph, from);
-		LGLYPH_SET_TO (glyph, to);
-	      }
-	  from = LGLYPH_FROM (this);
-	  to = LGLYPH_TO (this);
-	  j = i;
-	}
-      else
-	{
-	  if (from > LGLYPH_FROM (this))
-	    from = LGLYPH_FROM (this);
-	  if (to < LGLYPH_TO (this))
-	    to = LGLYPH_TO (this);
-	}
+      if (! (LGLYPH_FROM (glyph) <= LGLYPH_TO (glyph)
+	     && (LGLYPH_FROM (glyph) == from
+		 ? LGLYPH_TO (glyph) == to
+		 : LGLYPH_FROM (glyph) == to + 1)))
+	goto shaper_error;
+      from = LGLYPH_FROM (glyph);
+      to = LGLYPH_TO (glyph);
     }
-  if (j < i - 1)
-    for (; j < i; j++)
-      {
-	glyph = LGSTRING_GLYPH (gstring, j);
-	LGLYPH_SET_FROM (glyph, from);
-	LGLYPH_SET_TO (glyph, to);
-      }
+  if (to != LGSTRING_CHAR_LEN (gstring) - 1)
+    goto shaper_error;
   return composition_gstring_put_cache (gstring, XINT (n));
+
+ shaper_error:
+  return Qnil;
 }
 
 DEFUN ("font-variation-glyphs", Ffont_variation_glyphs, Sfont_variation_glyphs,

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 16 Sep 2012 12:44:01 GMT) Full text and rfc822 format available.

Message #194 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 16 Sep 2012 15:41:49 +0300

> From: Kenichi Handa <handa <at> gnu.org>
> Date: Sun, 16 Sep 2012 21:03:25 +0900
> Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
> 
> In article <87bohcty0r.fsf <at> gnu.org>, Kenichi Handa <handa <at> gnu.org> writes:
> 
> > I'll try to fix Ffont_shape_gstring, and check whether or
> > not it breaks rendersing of some scripts.
> 
> I've just installed this change.

Thanks.

> With this change, I tried
> to render several scripts that require CTL (complex text
> layout) while setting a breakpoint at this line:
>   shaper_error:
>     return Qnil;
> but, Emacs has never reach that line on GNU/Linux system
> (i.e. with m17n-lib and libotf).

Can you post here a few recipes you tested, so that I could try that
on MS-Windows?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 16 Sep 2012 15:45:02 GMT) Full text and rfc822 format available.

Message #197 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 16 Sep 2012 11:43:28 -0400

> Can you post here a few recipes you tested, so that I could try that
> on MS-Windows?

How 'bout adding them somewhere in the `test' directory?


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Sun, 16 Sep 2012 15:52:01 GMT) Full text and rfc822 format available.

Message #200 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 16 Sep 2012 18:50:29 +0300

> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Kenichi Handa <handa <at> gnu.org>,  11860 <at> debbugs.gnu.org,  smias <at> yandex.ru
> Date: Sun, 16 Sep 2012 11:43:28 -0400
> 
> > Can you post here a few recipes you tested, so that I could try that
> > on MS-Windows?
> 
> How 'bout adding them somewhere in the `test' directory?

I need first to get confirmation that this bug is fixed on Windows.
The OP can only use pre-compiled binaries, so did not yet try my
latest changes.

The proper place is probably test/redisplay-testsuite.el.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 17 Sep 2012 15:59:03 GMT) Full text and rfc822 format available.

Message #203 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 17 Sep 2012 23:08:22 +0900

[Message part 1 (text/plain, inline)]

In article <83obl6q856.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > > Can you post here a few recipes you tested, so that I could try that
> > > on MS-Windows?
> > 
> > How 'bout adding them somewhere in the `test' directory?

> I need first to get confirmation that this bug is fixed on Windows.
> The OP can only use pre-compiled binaries, so did not yet try my
> latest changes.

> The proper place is probably test/redisplay-testsuite.el.

The sample text files I used was generated by this script
after I installed various language support packages on my
system (Mint):

------------------------------------------------------------
#!/bin/sh
LOCALES="ar bn my gu hi kn lo or ta th bo"
FILES=

for L in $LOCALES; do
  rm -f "$L".txt
  for MO in /usr/share/locale/"$L"/LC_MESSAGES/*.mo; do
    msgunfmt "$MO" >> "$L".txt
  done
  FILES="$FILES $L.txt"
done

tar cfz sample.tar.gz $FILES
------------------------------------------------------------

And for each ??.txt file, I used this Lisp file:

----- shaper-test.el ---------------------------------------
(let ((coding-system-for-read 'utf-8))
  (find-file (car command-line-args-left))
  (sit-for 0)
  (while (not (pos-visible-in-window-p (point-max)))
    (scroll-up)
    (sit-for 0))
  (kill-emacs))
------------------------------------------------------------

as this:

(gdb) br font.c:4367
(gdb) run -Q -l shaper-test.el ar.txt
(gdb) run -Q -l shaper-test.el bn.txt
  ....

The attached is a tarball of generated ??.txt files.  Isn't
it too big to put under "test" subdir?

---
Kenichi Handa
handa <at> gnu.org

[sample.tar.gz (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 17 Sep 2012 17:00:02 GMT) Full text and rfc822 format available.

Message #206 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Kenichi Handa <handa <at> gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Mon, 17 Sep 2012 12:58:00 -0400

> The attached is a tarball of generated ??.txt files.
> Isn't it too big to put under "test" subdir?

Yes, it's too big, tho the scripts that generated them aren't.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Mon, 17 Aug 2020 22:46:01 GMT) Full text and rfc822 format available.

Message #209 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Steffan <smias <at> yandex.ru>
Cc: 11860 <at> debbugs.gnu.org
Subject: Re: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels)
 don't appear
Date: Mon, 17 Aug 2020 22:45:37 +0000

Steffan <smias <at> yandex.ru> writes:

> Hello,
>
> the diacritics characters (harakat, short vowels of arabic) don't appear in windows. (In linux it works very fine)
>
> These are the unicode names:
> U+064B (1611)	‏ً‎	Arabisches Fathatan	ARABIC FATHATAN
> U+064C (1612)	‏ٌ‎	Arabisches Dammatan	ARABIC DAMMATAN
> U+064D (1613)	‏ٍ‎	Arabisches Kasratan	ARABIC KASRATAN
> U+064E (1614)	‏َ‎	Arabisches Fatha	ARABIC FATHA
> U+064F (1615)	‏ُ‎	Arabisches Damma	ARABIC DAMMA
> U+0650 (1616)	‏ِ‎	Arabisches Kasra	ARABIC KASRA
> U+0651 (1617)	‏ّ‎	Arabisches Schadda	ARABIC SHADDA
> U+0652 (1618)	‏ْ‎	Arabisches Sukun	ARABIC SUKUN
>
> All these characters doesn't appear. Or I can see them shortly if the cursor is on one of them.
>
> I've tried many fonts, but it doesn't work. These special characters are in the file, emacs don't loose them. If I copy the text to another editor, I can see them.
>
> Should I download other packages?

(That was 8 years ago.)

Do you still see this on a recent version of Emacs, such as the recently
released version 27.1?

Would this have been fixed by the recent addition of harfbuzz support?

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#11860; Package emacs. (Tue, 18 Aug 2020 04:41:02 GMT) Full text and rfc822 format available.

Message #212 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1;
 Arabic - Harakat (diacritics, short vowels) don't appear
Date: Tue, 18 Aug 2020 07:40:06 +0300

> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Mon, 17 Aug 2020 22:45:37 +0000
> Cc: 11860 <at> debbugs.gnu.org
> 
> > the diacritics characters (harakat, short vowels of arabic) don't appear in windows. (In linux it works very fine)

I see them on Windows in Emacs 27.1.  So either this problem was fixed
in the meantime, or the problem was caused by some non-default font
the OP uses.  (Here those characters are displayed using the default
Courier New font.)

Reply sent to Stefan Kangas <stefan <at> marxist.se>:
You have taken responsibility. (Tue, 18 Aug 2020 09:48:02 GMT) Full text and rfc822 format available.

Notification sent to Steffan <smias <at> yandex.ru>:
bug acknowledged by developer. (Tue, 18 Aug 2020 09:48:02 GMT) Full text and rfc822 format available.

Message #217 received at 11860-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 11860-done <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: Re: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels)
 don't appear
Date: Tue, 18 Aug 2020 09:47:25 +0000

Eli Zaretskii <eliz <at> gnu.org> writes:

>> > the diacritics characters (harakat, short vowels of arabic) don't appear in windows. (In linux it works very fine)
>
> I see them on Windows in Emacs 27.1.  So either this problem was fixed
> in the meantime, or the problem was caused by some non-default font
> the OP uses.  (Here those characters are displayed using the default
> Courier New font.)

Thanks.  I'm therefore closing this bug.

If anyone is still seeing this, please reply to this email (use "Reply
to all" in your email client) and we can reopen the bug report.

Best regards,
Stefan Kangas

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 15 Sep 2020 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 242 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #11860 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear