GNU bug report logs - #21449
Emacs lisp mode: incorrect fontification of symbols containing escaped characters.

Previous Next

Package: emacs;

Reported by: Alan Mackenzie <acm <at> muc.de>

Date: Wed, 9 Sep 2015 20:17:01 UTC

Severity: minor

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21449 in the body.
You can then email your comments to 21449 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#21449; Package emacs. (Wed, 09 Sep 2015 20:17:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alan Mackenzie <acm <at> muc.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 09 Sep 2015 20:17:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: bug-gnu-emacs <at> gnu.org
Subject: Emacs lisp mode: incorrect fontification of symbols containing
 escaped characters.
Date: Wed, 9 Sep 2015 20:10:42 +0000
Hello, Emacs.

In Emacs lisp mode, write something like:

    (defun fix-re--RA|RB->R\(A|B\) (foo bar) ...)
                           ^^^^^^^

.  The part of the symbol indicated remains unfontified.  This happens
because the fontification patterns in .../lisp/emacs-lisp/lisp-mode.el
dont't take account of escaped characters in symbol names.

By replacing lots of "\\(?:\\sw\\|\\s_\\)" by
"\\(?:\\sw\\|\\s_\\|\\\\.\\)", the fontification is repaired.

As a bonus, this fix makes imenu work properly with these symbols too.

Unfortunately, as yet the regular expression expressions "\\_<" and
"\\_>" don't work properly with these symbols.  To fix that would need
an amendment to .../src/regex.c, with possibly .../src/syntax.c needing
one too.  It feels like there really ought to be some major mode
dependent flag saying whether or not escaped characters are valid in
identifiers.  They are in Emacs lisp, but they're not in C.

Anyhow, here's the patch:



diff --git a/lisp/emacs-lisp/lisp-mode.el b/lisp/emacs-lisp/lisp-mode.el
index 8aa34c7..7be7cb3 100644
--- a/lisp/emacs-lisp/lisp-mode.el
+++ b/lisp/emacs-lisp/lisp-mode.el
@@ -110,7 +110,7 @@
                                 ;; CLOS and EIEIO
 				"defgeneric" "defmethod")
                               t))
-			   "\\s-+\\(\\(\\sw\\|\\s_\\)+\\)"))
+			   "\\s-+\\(\\(\\sw\\|\\s_\\|\\\\.\\)+\\)"))
 	 2)
    (list (purecopy "Variables")
 	 (purecopy (concat "^\\s-*("
@@ -122,11 +122,11 @@
                                 "defconstant"
 				"defparameter" "define-symbol-macro")
                               t))
-			   "\\s-+\\(\\(\\sw\\|\\s_\\)+\\)"))
+			   "\\s-+\\(\\(\\sw\\|\\s_\\|\\\\.\\)+\\)"))
 	 2)
    ;; For `defvar', we ignore (defvar FOO) constructs.
    (list (purecopy "Variables")
-	 (purecopy (concat "^\\s-*(defvar\\s-+\\(\\(\\sw\\|\\s_\\)+\\)"
+	 (purecopy (concat "^\\s-*(defvar\\s-+\\(\\(\\sw\\|\\s_\\|\\\\.\\)+\\)"
 			   "[[:space:]\n]+[^)]"))
 	 1)
    (list (purecopy "Types")
@@ -143,7 +143,7 @@
                                 ;; CLOS and EIEIO
                                 "defclass")
                               t))
-			   "\\s-+'?\\(\\(\\sw\\|\\s_\\)+\\)"))
+			   "\\s-+'?\\(\\(\\sw\\|\\s_\\|\\\\.\\)+\\)"))
 	 2))
 
   "Imenu generic expression for Lisp mode.  See `imenu-generic-expression'.")
@@ -220,7 +220,7 @@
 (defun lisp--el-match-keyword (limit)
   ;; FIXME: Move to elisp-mode.el.
   (catch 'found
-    (while (re-search-forward "(\\(\\(?:\\sw\\|\\s_\\)+\\)\\_>" limit t)
+    (while (re-search-forward "(\\(\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)\\_>" limit t)
       (let ((sym (intern-soft (match-string 1))))
 	(when (or (special-form-p sym)
 		  (and (macrop sym)
@@ -349,7 +349,7 @@
                   ;; Any whitespace and defined object.
                   "[ \t']*"
                   "\\(([ \t']*\\)?" ;; An opening paren.
-                  "\\(\\(setf\\)[ \t]+\\(?:\\sw\\|\\s_\\)+\\|\\(?:\\sw\\|\\s_\\)+\\)?")
+                  "\\(\\(setf\\)[ \t]+\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\|\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)?")
           (1 font-lock-keyword-face)
           (3 (let ((type (get (intern-soft (match-string 1)) 'lisp-define-type)))
                (cond ((eq type 'var) font-lock-variable-name-face)
@@ -373,7 +373,7 @@
                   ;; Any whitespace and defined object.
                   "[ \t']*"
                   "\\(([ \t']*\\)?" ;; An opening paren.
-                  "\\(\\(setf\\)[ \t]+\\(?:\\sw\\|\\s_\\)+\\|\\(?:\\sw\\|\\s_\\)+\\)?")
+                  "\\(\\(setf\\)[ \t]+\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\|\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)?")
           (1 font-lock-keyword-face)
           (3 (let ((type (get (intern-soft (match-string 1)) 'lisp-define-type)))
                (cond ((eq type 'var) font-lock-variable-name-face)
@@ -395,22 +395,22 @@
          (lisp--el-match-keyword . 1)
          ;; Exit/Feature symbols as constants.
          (,(concat "(\\(catch\\|throw\\|featurep\\|provide\\|require\\)\\_>"
-                   "[ \t']*\\(\\(?:\\sw\\|\\s_\\)+\\)?")
+                   "[ \t']*\\(\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)?")
            (1 font-lock-keyword-face)
            (2 font-lock-constant-face nil t))
          ;; Erroneous structures.
          (,(concat "(" el-errs-re "\\_>")
            (1 font-lock-warning-face))
          ;; Words inside \\[] tend to be for `substitute-command-keys'.
-         ("\\\\\\\\\\[\\(\\(?:\\sw\\|\\s_\\)+\\)\\]"
+         ("\\\\\\\\\\[\\(\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)\\]"
           (1 font-lock-constant-face prepend))
          ;; Words inside ‘’ and '' and `' tend to be symbol names.
-         ("['`‘]\\(\\(?:\\sw\\|\\s_\\)\\(?:\\sw\\|\\s_\\)+\\)['’]"
+         ("['`‘]\\(\\(?:\\sw\\|\\s_\\|\\\\.\\)\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)['’]"
           (1 font-lock-constant-face prepend))
          ;; Constant values.
-         ("\\_<:\\(?:\\sw\\|\\s_\\)+\\_>" 0 font-lock-builtin-face)
+         ("\\_<:\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\_>" 0 font-lock-builtin-face)
          ;; ELisp and CLisp `&' keywords as types.
-         ("\\_<\\&\\(?:\\sw\\|\\s_\\)+\\_>" . font-lock-type-face)
+         ("\\_<\\&\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\_>" . font-lock-type-face)
          ;; ELisp regexp grouping constructs
          (,(lambda (bound)
              (catch 'found
@@ -447,19 +447,19 @@
          (,(concat "(" cl-kws-re "\\_>") . 1)
          ;; Exit/Feature symbols as constants.
          (,(concat "(\\(catch\\|throw\\|provide\\|require\\)\\_>"
-                   "[ \t']*\\(\\(?:\\sw\\|\\s_\\)+\\)?")
+                   "[ \t']*\\(\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)?")
            (1 font-lock-keyword-face)
            (2 font-lock-constant-face nil t))
          ;; Erroneous structures.
          (,(concat "(" cl-errs-re "\\_>")
            (1 font-lock-warning-face))
          ;; Words inside ‘’ and '' and `' tend to be symbol names.
-         ("['`‘]\\(\\(?:\\sw\\|\\s_\\)\\(?:\\sw\\|\\s_\\)+\\)['’]"
+         ("['`‘]\\(\\(?:\\sw\\|\\s_\\|\\\\.\\)\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\)['’]"
           (1 font-lock-constant-face prepend))
          ;; Constant values.
-         ("\\_<:\\(?:\\sw\\|\\s_\\)+\\_>" 0 font-lock-builtin-face)
+         ("\\_<:\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\_>" 0 font-lock-builtin-face)
          ;; ELisp and CLisp `&' keywords as types.
-         ("\\_<\\&\\(?:\\sw\\|\\s_\\)+\\_>" . font-lock-type-face)
+         ("\\_<\\&\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\_>" . font-lock-type-face)
          ;; This is too general -- rms.
          ;; A user complained that he has functions whose names start with `do'
          ;; and that they get the wrong color.
@@ -482,7 +482,7 @@
   (let* ((firstsym (and listbeg
                         (save-excursion
                           (goto-char listbeg)
-                          (and (looking-at "([ \t\n]*\\(\\(\\sw\\|\\s_\\)+\\)")
+                          (and (looking-at "([ \t\n]*\\(\\(\\sw\\|\\s_\\|\\\\.\\)+\\)")
                                (match-string 1)))))
          (docelt (and firstsym
                       (function-get (intern-soft firstsym)



-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21449; Package emacs. (Thu, 10 Sep 2015 03:31:02 GMT) Full text and rfc822 format available.

Message #8 received at 21449 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Alan Mackenzie <acm <at> muc.de>
Cc: 21449 <at> debbugs.gnu.org
Subject: Re: bug#21449: Emacs lisp mode: incorrect fontification of symbols
 containing escaped characters.
Date: Wed, 09 Sep 2015 23:30:31 -0400
> one too.  It feels like there really ought to be some major mode
> dependent flag saying whether or not escaped characters are valid in
> identifiers.  They are in Emacs lisp, but they're not in C.

You mean like words-include-escapes?


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21449; Package emacs. (Thu, 10 Sep 2015 10:44:01 GMT) Full text and rfc822 format available.

Message #11 received at 21449 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 21449 <at> debbugs.gnu.org
Subject: Re: bug#21449: Emacs lisp mode: incorrect fontification of symbols
 containing escaped characters.
Date: Thu, 10 Sep 2015 10:44:34 +0000
Hello, Stefan.

On Wed, Sep 09, 2015 at 11:30:31PM -0400, Stefan Monnier wrote:
> > one too.  It feels like there really ought to be some major mode
> > dependent flag saying whether or not escaped characters are valid in
> > identifiers.  They are in Emacs lisp, but they're not in C.

> You mean like words-include-escapes?

Something like that, yes.

I don't think words-include-escapes is the right thing to use, though.
I think that doing M-f on "R\(A|B\)", one would want point to move to
just after the R, not just after the A; escaped characters should be
word separators, just like -s are; they should be thought of as \\s_
rather than \\sw.

Something like `identifiers-include-escapes', maybe?

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21449; Package emacs. (Fri, 11 Sep 2015 14:49:01 GMT) Full text and rfc822 format available.

Message #14 received at 21449 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 21449 <at> debbugs.gnu.org
Subject: Re: bug#21449: Emacs lisp mode: incorrect fontification of symbols
 containing escaped characters.
Date: Fri, 11 Sep 2015 14:49:22 +0000
Hello, Stefan.

On Wed, Sep 09, 2015 at 11:30:31PM -0400, Stefan Monnier wrote:
> > one too.  It feels like there really ought to be some major mode
> > dependent flag saying whether or not escaped characters are valid in
> > identifiers.  They are in Emacs lisp, but they're not in C.

> You mean like words-include-escapes?

Is there any objection to me installing the patch to lisp-mode.el?

It inserts "\\|\\\\." into each font-locking regexp which contains a bit
looking like "(?:\\sw\\|\\s_", so that escaped characters will be picked
up.  There are no other changes.

I think it unlikely this will do any damage.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21449; Package emacs. (Fri, 11 Sep 2015 16:57:03 GMT) Full text and rfc822 format available.

Message #17 received at 21449 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Alan Mackenzie <acm <at> muc.de>
Cc: 21449 <at> debbugs.gnu.org
Subject: Re: bug#21449: Emacs lisp mode: incorrect fontification of symbols
 containing escaped characters.
Date: Fri, 11 Sep 2015 12:56:47 -0400
> Is there any objection to me installing the patch to lisp-mode.el?

The resulting regexps are harder to read, for a very small benefit since
identifiers with backslashes should be avoided for the sanity of the
human reader anyway.

But if you like it, fell free to install it,


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21449; Package emacs. (Thu, 31 Oct 2019 17:01:01 GMT) Full text and rfc822 format available.

Message #20 received at 21449 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: 21449 <at> debbugs.gnu.org
Subject: Re: Emacs lisp mode: incorrect fontification of symbols containing
 escaped characters.
Date: Thu, 31 Oct 2019 18:00:32 +0100
Alan Mackenzie <acm <at> muc.de> writes:

>           ;; Constant values.
> -         ("\\_<:\\(?:\\sw\\|\\s_\\)+\\_>" 0 font-lock-builtin-face)
> +         ("\\_<:\\(?:\\sw\\|\\s_\\|\\\\.\\)+\\_>" 0 font-lock-builtin-face)

This code has changed a lot since this was reported:

         ;; Constant values.
         (,(concat "\\_<:" lisp-mode-symbol-regexp "\\_>")
          (0 font-lock-builtin-face))

But:

lisp-mode-symbol-regexp
=> "\\(?:\\sw\\|\\s_\\|\\\\.\\)+"

So it basically looks like this was added in some form or other, and the
test case fontifies correctly for me now, so I'm going to go ahead and
guess that this works as it's supposed to now, and I'm closing this bug
report.

Please reopen if it's still an issue.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug closed, send any further explanations to 21449 <at> debbugs.gnu.org and Alan Mackenzie <acm <at> muc.de> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 31 Oct 2019 17:01:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 29 Nov 2019 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 146 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.