GNU bug report logs - #24405
24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Oleksandr Gavenko <gavenkoa@HIDDEN>; Keywords: notabug; dated Sat, 10 Sep 2016 08:35:01 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.
Added tag(s) notabug. Request was from Eli Zaretskii <eliz@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 24405 <at> debbugs.gnu.org:


Received: (at 24405) by debbugs.gnu.org; 10 Sep 2016 10:05:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Sep 10 06:05:39 2016
Received: from localhost ([127.0.0.1]:55150 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bifAI-0004tx-Rc
	for submit <at> debbugs.gnu.org; Sat, 10 Sep 2016 06:05:39 -0400
Received: from eggs.gnu.org ([208.118.235.92]:59627)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1bifAH-0004tm-R0
 for 24405 <at> debbugs.gnu.org; Sat, 10 Sep 2016 06:05:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1bifA9-00057z-FT
 for 24405 <at> debbugs.gnu.org; Sat, 10 Sep 2016 06:05:32 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD
 autolearn=disabled version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57998)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1bifA0-000566-Hw; Sat, 10 Sep 2016 06:05:20 -0400
Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2439
 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1bif9y-0005jt-0o; Sat, 10 Sep 2016 06:05:19 -0400
Date: Sat, 10 Sep 2016 13:05:09 +0300
Message-Id: <83lgz083ze.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Oleksandr Gavenko <gavenkoa@HIDDEN>
In-reply-to: <87mvjgupau.fsf@HIDDEN> (message from Oleksandr
 Gavenko on Sat, 10 Sep 2016 11:33:45 +0300)
Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing
 between latin/phonetic scripts.
References: <87mvjgupau.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -6.3 (------)
X-Debbugs-Envelope-To: 24405
Cc: 24405 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.3 (------)

tags 24405 + notabug
thanks

> From: Oleksandr Gavenko <gavenkoa@HIDDEN>
> Date: Sat, 10 Sep 2016 11:33:45 +0300
> 
> Evaluate following form by C-x C-e:
> 
>   (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
>         (word-separating-categories nil))
>     (forward-word))
> 
>   HelloПривLLжɪəʊheləʊaiɪa
> 
> My pointer stopped between ʊh.
> 
> I have:
> 
>   (aref char-script-table ?ʊ) phonetic
>   (aref char-script-table ?h) latin
>   (aref char-script-table ?ж) cyrillic
> 
>   (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl"
>   (category-set-mnemonics (char-category-set ?h)) ".Lalr"
> 
>   (category-docstring ?y) "Cyrillic"
>   (category-docstring ?l) "Latin"
> 
> I expect that point moved to last character before new line.
> 
> Seems that:
> 
>   (?l . ?y) (?y . ?l)
> 
> has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
> scripts but refused to move through Latin/Phonetic scripts.
> 
> If it is intended behavior how will I make Emacs to move across Latin/Phonetic
> scripts?

You can't do this for 2 characters that belong to different scripts,
but have the same categories in their category sets.  Those two
characters both have the 'l' (Latin) category in their sets, so you
cannot force Emacs to consider them not as word boundary.

For the same reason, including a cons cell whose members are
identical, such as (?l . ?l), has no effect.

This is the intended behavior, yes.  The word-combining-categories
feature is designed to support specific rare situations with mixing
the Far Eastern scripts (e.g., use of Kanji characters in Japanese
text), not for arbitrary games with Latin and European scripts.

May I ask why do you need to consider the above a single word?  In
what situation(s) does that make sense?

Thanks.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#24405; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 10 Sep 2016 08:34:11 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Sep 10 04:34:11 2016
Received: from localhost ([127.0.0.1]:55118 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bidjj-0002jt-SS
	for submit <at> debbugs.gnu.org; Sat, 10 Sep 2016 04:34:11 -0400
Received: from eggs.gnu.org ([208.118.235.92]:45938)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <gavenkoa@HIDDEN>) id 1bidjh-0002jO-Df
 for submit <at> debbugs.gnu.org; Sat, 10 Sep 2016 04:34:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <gavenkoa@HIDDEN>) id 1bidjb-0005E2-6F
 for submit <at> debbugs.gnu.org; Sat, 10 Sep 2016 04:34:00 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
 T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:36020)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <gavenkoa@HIDDEN>) id 1bidjb-0005Dt-3B
 for submit <at> debbugs.gnu.org; Sat, 10 Sep 2016 04:33:59 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36908)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <gavenkoa@HIDDEN>) id 1bidjY-0003xF-Vj
 for bug-gnu-emacs@HIDDEN; Sat, 10 Sep 2016 04:33:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <gavenkoa@HIDDEN>) id 1bidjT-0005CT-Pu
 for bug-gnu-emacs@HIDDEN; Sat, 10 Sep 2016 04:33:55 -0400
Received: from mail-lf0-x236.google.com ([2a00:1450:4010:c07::236]:35988)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <gavenkoa@HIDDEN>) id 1bidjT-0005CP-Ig
 for bug-gnu-emacs@HIDDEN; Sat, 10 Sep 2016 04:33:51 -0400
Received: by mail-lf0-x236.google.com with SMTP id g62so58297470lfe.3
 for <bug-gnu-emacs@HIDDEN>; Sat, 10 Sep 2016 01:33:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:to:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=kfiiHCFb2s2jGKfkofEKs6Iv1g1K6v+loeE8dtLdwgI=;
 b=yTZ33QH7C9knKnFIoQBoMAKhiQ+dEUAIL2XrMes8TiuCE65kwV7sfldrU7w1eGG8IM
 ZZc541y7eU4F8xT0OV+3G035qN0uaY1NWEAmPALyDyWUfLinowe5lvKOohV+pT4KdzCN
 8E18Wmnnvc4RL4SZmtKaR5iFTcG2nuS9k0eVqMLNzYGgAWleOPhH83M8q9fRp4YumB1P
 zpQmNQVF+bIQYzOPx2toM83xm/K4KFe5rPKm8dQkCIzDgHxv/rzCHo+B7sI8HD+eA+j4
 5mBEbHtvf73ce3OrywUnPIBc18nHOdFvb0nOcwrlED7vA2mupqpT1x/5AxHUX6k0Kx4A
 bw/A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=kfiiHCFb2s2jGKfkofEKs6Iv1g1K6v+loeE8dtLdwgI=;
 b=RVxuK5Yz5eov1OXDxtxmoEl0WIbX9lySdUrb2WYei5BHyGtvSvaeo3mURhTPrCuu9Q
 LjSkcoQ/gHgdQdV4tHg0vRN5KA9qji191MyAJynTFIro1Q42XLNoOKa9xRyqfSW/c5K7
 ANThy8CqqHGDN47urd4o2ZBXc0k5HUZdUDCyjlE/SUHkFoMklXQ3q2/qXC4tJP3JeUWw
 p1EICoShgByGi2msFSB7VDk2Ny8CYn6sw5FABUXJbL+LkdrNo3WAw/z6e8hdA5VjGMVy
 uh28bjD1yed4Vd1Suo+x9benJzs1+gE2KX/An8Qxh24/K41TB9xnrvVrdwJGwrmEAtXn
 khNg==
X-Gm-Message-State: AE9vXwMpT/qy5EFKZpBmjV/eGxN9BjhyViuCh0Aw7PNrvLqSGOLyY5CQS6deauuhwm4zwQ==
X-Received: by 10.46.32.227 with SMTP id g96mr415569lji.30.1473496429693;
 Sat, 10 Sep 2016 01:33:49 -0700 (PDT)
Received: from desktop ([46.185.21.165])
 by smtp.gmail.com with ESMTPSA id b71sm1301099lfb.42.2016.09.10.01.33.48
 for <bug-gnu-emacs@HIDDEN>
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Sat, 10 Sep 2016 01:33:48 -0700 (PDT)
From: Oleksandr Gavenko <gavenkoa@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing between
 latin/phonetic scripts.
Date: Sat, 10 Sep 2016 11:33:45 +0300
Message-ID: <87mvjgupau.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

Evaluate following form by C-x C-e:

  (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
        (word-separating-categories nil))
    (forward-word))

  Hello=D0=9F=D1=80=D0=B8=D0=B2LL=D0=B6=C9=AA=C9=99=CA=8Ahel=C9=99=CA=8Aai=
=C9=AAa

My pointer stopped between =CA=8Ah.

I have:

  (aref char-script-table ?=CA=8A) phonetic
  (aref char-script-table ?h) latin
  (aref char-script-table ?=D0=B6) cyrillic

  (category-set-mnemonics (char-category-set ?=CA=8A)) ".Ljl"
  (category-set-mnemonics (char-category-set ?h)) ".Lalr"

  (category-docstring ?y) "Cyrillic"
  (category-docstring ?l) "Latin"

I expect that point moved to last character before new line.

Seems that:

  (?l . ?y) (?y . ?l)

has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
scripts but refused to move through Latin/Phonetic scripts.

If it is intended behavior how will I make Emacs to move across Latin/Phone=
tic
scripts?

See also:

  http://emacs.stackexchange.com/questions/21131/does-word-syntax-take-scri=
pt-into-account

In GNU Emacs 24.5.1 (x86_64-pc-linux-gnu, GTK+ Version 3.18.6)
 of 2016-01-22 on binet, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11803000
System Description:	Debian GNU/Linux testing (stretch)

--=20
http://defun.work/




Acknowledgement sent to Oleksandr Gavenko <gavenkoa@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#24405; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sat, 10 Sep 2016 10:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.