GNU bug report logs - #31204
25.3; Make word motion more customizable

Previous Next

Package: emacs;

Reported by: Yuri Khan <yuri.v.khan <at> gmail.com>

Date: Wed, 18 Apr 2018 08:56:01 UTC

Severity: wishlist

Tags: moreinfo, wontfix

Found in version 25.3

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31204 in the body.
You can then email your comments to 31204 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#31204; Package emacs. (Wed, 18 Apr 2018 08:56:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Yuri Khan <yuri.v.khan <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 18 Apr 2018 08:56:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Yuri Khan <yuri.v.khan <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 25.3; Make word motion more customizable
Date: Wed, 18 Apr 2018 15:55:16 +0700
While trying to make word motion commands (Ctrl+left/right, M-f/b) more
similar to that implemented in other editors:

http://lists.gnu.org/archive/html/help-gnu-emacs/2018-04/msg00230.html

I encountered a difficulty.

The function ‘forward-word’ behaves as follows:

1. Skip to the nearest character having word constituent syntax.
2. Skip to the nearest word boundary.

Step 2, by default, finds:

* a non-word-constituent character, OR
* a transition between two adjacent characters of different scripts
  (subject to exceptions controlled by ‘word-combining-categories’ and
  ‘word-separating-categories’),

whichever comes first.

Step 2 can also be customized by modifying
‘find-word-boundary-function-table’. This enables various useful
behaviors such as ‘subword-mode’, ‘superword-mode’, and possibly CJK
word breaking rules.

Step 1, on the other hand, is not customizable at all.


The specific behavior that I was trying to implement was to find the
nearest transition:

* from a word character to a non-word character, OR
* from a non-word non-whitespace character to a word character, OR
* from a non-word non-whitespace character to a whitespace character.

As an illustration (where ‘|’ specifies word motion stops when going
left to right):

    foo| ***| +++| (|bar|)|
       ^

When cursor is after ‘foo’, step 1 of ‘forward-word’ skips to directly
before ‘bar’, missing two stops.

As a result, implementing the desired behavior requires either:

* defining separate functions ‘my-forward-word’, ‘my-backward-word’,
  ‘my-left-word’, ‘my-right-word’, ‘my-kill-word’,
  ‘my-backward-kill-word’, and possibly more, and remapping their key
  bindings; OR

* advising ‘forward-word’ with an :override.


Perhaps it would be nice to have an optional hook for step 1 of
‘forward-word’, a function that would take two arguments POS and LIMIT,
and returning the starting word boundary position from which step 2 would
then work.


In GNU Emacs 25.3.2 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9)
 of 2017-09-13 built on lcy01-32
Windowing system distributor 'The X.Org Foundation', version 11.0.11905000
System Description:    Ubuntu 16.04.4 LTS

Configured using:
 'configure --build=x86_64-linux-gnu --prefix=/usr
 '--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
 '--infodir=${prefix}/share/info' --sysconfdir=/etc --localstatedir=/var
 --disable-silent-rules '--libdir=${prefix}/lib/x86_64-linux-gnu'
 '--libexecdir=${prefix}/lib/x86_64-linux-gnu' --disable-maintainer-mode
 --disable-dependency-tracking --prefix=/usr --sharedstatedir=/var/lib
 --program-suffix=25 --with-modules --with-x=yes --with-x-toolkit=gtk3
 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat
 -Werror=format-security' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro''

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LC_MONETARY: en_US.UTF-8
  value of $LC_NUMERIC: en_US.UTF-8
  value of $LC_TIME: en_DK.utf8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired format-spec rfc822 mml
mml-sec password-cache epg epg-config gnus-util mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail
rfc2047 rfc2045 ietf-drums mm-util help-fns help-mode easymenu
cl-loaddefs pcase cl-lib mail-prsvr mail-utils time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt
fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote dbusbind inotify dynamic-setting
system-font-setting font-render-setting move-toolbar gtk x-toolkit x
multi-tty make-network-process emacs)

Memory information:
((conses 16 86338 5928)
 (symbols 48 19769 0)
 (miscs 40 49 121)
 (strings 32 14363 4733)
 (string-bytes 1 409522)
 (vectors 16 11755)
 (vector-slots 8 430899 3852)
 (floats 8 166 64)
 (intervals 56 231 0)
 (buffers 976 18)
 (heap 1024 33279 1050))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31204; Package emacs. (Wed, 18 May 2022 12:04:02 GMT) Full text and rfc822 format available.

Message #8 received at 31204 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Yuri Khan <yuri.v.khan <at> gmail.com>
Cc: 31204 <at> debbugs.gnu.org
Subject: Re: bug#31204: 25.3; Make word motion more customizable
Date: Wed, 18 May 2022 14:03:42 +0200
Yuri Khan <yuri.v.khan <at> gmail.com> writes:

> While trying to make word motion commands (Ctrl+left/right, M-f/b) more
> similar to that implemented in other editors:
>
> http://lists.gnu.org/archive/html/help-gnu-emacs/2018-04/msg00230.html
>
> I encountered a difficulty.

[...]

> Step 1, on the other hand, is not customizable at all.
>
> The specific behavior that I was trying to implement was to find the
> nearest transition:
>
> * from a word character to a non-word character, OR
> * from a non-word non-whitespace character to a word character, OR
> * from a non-word non-whitespace character to a whitespace character.
>
> As an illustration (where ‘|’ specifies word motion stops when going
> left to right):
>
>     foo| ***| +++| (|bar|)|
>        ^

[...]

> Perhaps it would be nice to have an optional hook for step 1 of
> ‘forward-word’, a function that would take two arguments POS and LIMIT,
> and returning the starting word boundary position from which step 2 would
> then work.

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

I think this sounds like it could be useful.  If we added such a hook to
`forward-word', what would the rest of the code look like to make
`C-<right>' work this way?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 18 May 2022 12:04:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31204; Package emacs. (Wed, 15 Jun 2022 15:04:01 GMT) Full text and rfc822 format available.

Message #13 received at 31204 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Yuri Khan <yuri.v.khan <at> gmail.com>
Cc: 31204 <at> debbugs.gnu.org
Subject: Re: bug#31204: 25.3; Make word motion more customizable
Date: Wed, 15 Jun 2022 17:03:16 +0200
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

>> As an illustration (where ‘|’ specifies word motion stops when going
>> left to right):
>>
>>     foo| ***| +++| (|bar|)|

[...]

> I think this sounds like it could be useful.  If we added such a hook to
> `forward-word', what would the rest of the code look like to make
> `C-<right>' work this way?

I've played a bit at the patch below, but I tend to think that this is
going about things the wrong way.  That is, for something like this to
work meaningfully, it would require a lot of setup (because
Vfind_word_boundary_function_table) would also have to be altered in
conjunction with this.

I.e., it's really about changing the definition of what a "word" is, and
in that case, I think it would be easier to just do that in a syntax
table, and then everything would work automatically.

(Or by advising the functions here.)

So I don't think it'd be worth it to proceed with something like the
below, and I'm therefore closing this bug report.

diff --git a/src/syntax.c b/src/syntax.c
index f9022d18d2..02d4dd4b9a 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1462,20 +1462,33 @@ scan_words (ptrdiff_t from, EMACS_INT count)
 
   while (count > 0)
     {
-      while (true)
+      if (!NILP (Vfind_word_start_function))
 	{
-	  if (from == end)
+	  Lisp_Object np = call2 (Vfind_word_start_function,
+				  make_fixnum (from), make_fixnum (end));
+	  if (!FIXNUMP (np))
 	    return 0;
-	  UPDATE_SYNTAX_TABLE_FORWARD (from);
+	  from = XFIXNUM (np);
+	  from_byte = CHAR_TO_BYTE (from);
 	  ch0 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
-	  code = SYNTAX (ch0);
-	  inc_both (&from, &from_byte);
-	  if (words_include_escapes
-	      && (code == Sescape || code == Scharquote))
-	    break;
-	  if (code == Sword)
-	    break;
-	  rarely_quit (from);
+	}
+      else
+	{
+	  while (true)
+	    {
+	      if (from == end)
+		return 0;
+	      UPDATE_SYNTAX_TABLE_FORWARD (from);
+	      ch0 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+	      code = SYNTAX (ch0);
+	      inc_both (&from, &from_byte);
+	      if (words_include_escapes
+		  && (code == Sescape || code == Scharquote))
+		break;
+	      if (code == Sword)
+		break;
+	      rarely_quit (from);
+	    }
 	}
       /* Now CH0 is a character which begins a word and FROM is the
          position of the next character.  */
@@ -3792,6 +3805,12 @@ syms_of_syntax (void)
 In both cases, LIMIT bounds the search. */);
   Vfind_word_boundary_function_table = Fmake_char_table (Qnil, Qnil);
 
+  DEFVAR_LISP ("find-word-start-function",
+	       Vfind_word_start_function,
+	       doc: /* Function called to find the start of a word.
+It's called with two parameters, POS and LIMIT.  */);
+  Vfind_word_start_function = Qnil;
+
   DEFVAR_BOOL ("comment-end-can-be-escaped", comment_end_can_be_escaped,
                doc: /* Non-nil means an escaped ender inside a comment doesn't end the comment.  */);
   comment_end_can_be_escaped = false;


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) wontfix. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 15 Jun 2022 15:04:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 31204 <at> debbugs.gnu.org and Yuri Khan <yuri.v.khan <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 15 Jun 2022 15:04:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 14 Jul 2022 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 285 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.