GNU bug report logs - #38104
27.0.50; elixir-mode fontification is very slow

Previous Next

Package: emacs;

Reported by: Dmitry Gutov <dgutov <at> yandex.ru>

Date: Thu, 7 Nov 2019 15:41:02 UTC

Severity: normal

Found in version 27.0.50

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 38104 in the body.
You can then email your comments to 38104 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#38104; Package emacs. (Thu, 07 Nov 2019 15:41:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dmitry Gutov <dgutov <at> yandex.ru>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 07 Nov 2019 15:41:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; elixir-mode fontification is very slow
Date: Thu, 7 Nov 2019 17:40:11 +0200
[Message part 1 (text/plain, inline)]
I haven't been able to track this to a particular component (e.g. a
regexp) for now, but font-lock-fontify-region is now considerably slower
than it was in Emacs 26 (at least at revision cb8fb597e5bf4f14).

To reproduce: install elixir-mode (e.g. from MELPA Stable):

(add-to-list 'package-archives
             '("melpa-stable" . "https://stable.melpa.org/packages/") t)

M-x list-packages, install elixir-mode.

Savet the attached tiny.__ex__ as tiny.ex.

Visit tiny.ex.

Eval: (benchmark 1 '(font-lock-fontify-region (point-min) (point-max))).

"Elapsed time: 0.158824s"

With larger files, the times are much longer.

I had a break from Elixir, so I noticed this only now.

In GNU Emacs 27.0.50 (build 11, x86_64-pc-linux-gnu, GTK+ Version 3.24.8)
 of 2019-11-05 built on potemkin
Repository revision: dd19cc3aa16ccc441a8a2bfcdeb3005a6eef2543
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12004000
System Description: Ubuntu 19.04
[tiny.__ex__ (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38104; Package emacs. (Tue, 26 Nov 2019 16:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 38104 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: 38104 <at> debbugs.gnu.org
Cc: Mattias Engdegård <mattiase <at> acm.org>
Subject: Re: bug#38104: 27.0.50; elixir-mode fontification is very slow
Date: Tue, 26 Nov 2019 18:26:13 +0200
I did a 'git bisect', and it came down to:

  commit 2ed71227c626c6cfdc684948644ccf3d9eaeb15b
  Author: Mattias Engdegård <mattiase <at> acm.org>
  Date:   Wed Sep 25 14:29:50 2019 -0700

      New rx implementation

Mattias, could you look into it?

elixir-mode does use rx, heavily. Albeit with a thin wrapper.

To be clear, elixir-mode is quite unusable now.

On 07.11.2019 17:40, Dmitry Gutov wrote:
> I haven't been able to track this to a particular component (e.g. a
> regexp) for now, but font-lock-fontify-region is now considerably slower
> than it was in Emacs 26 (at least at revision cb8fb597e5bf4f14).
> 
> To reproduce: install elixir-mode (e.g. from MELPA Stable):
> 
> (add-to-list 'package-archives
>               '("melpa-stable" . "https://stable.melpa.org/packages/") t)
> 
> M-x list-packages, install elixir-mode.
> 
> Savet the attached tiny.__ex__ as tiny.ex.
> 
> Visit tiny.ex.
> 
> Eval: (benchmark 1 '(font-lock-fontify-region (point-min) (point-max))).
> 
> "Elapsed time: 0.158824s"
> 
> With larger files, the times are much longer.
> 
> I had a break from Elixir, so I noticed this only now.
> 
> In GNU Emacs 27.0.50 (build 11, x86_64-pc-linux-gnu, GTK+ Version 3.24.8)
>   of 2019-11-05 built on potemkin
> Repository revision: dd19cc3aa16ccc441a8a2bfcdeb3005a6eef2543
> Repository branch: master
> Windowing system distributor 'The X.Org Foundation', version 11.0.12004000
> System Description: Ubuntu 19.04





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38104; Package emacs. (Tue, 26 Nov 2019 16:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 38104 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: 38104 <at> debbugs.gnu.org
Cc: Mattias Engdegård <mattiase <at> acm.org>
Subject: Re: bug#38104: 27.0.50; elixir-mode fontification is very slow
Date: Tue, 26 Nov 2019 18:30:58 +0200
On 26.11.2019 18:26, Dmitry Gutov wrote:
> elixir-mode does use rx, heavily. Albeit with a thin wrapper.

And one more thing: this wrapper makes use of rx-constituents.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38104; Package emacs. (Tue, 26 Nov 2019 17:00:02 GMT) Full text and rfc822 format available.

Message #14 received at 38104 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 38104 <at> debbugs.gnu.org
Subject: Re: bug#38104: 27.0.50; elixir-mode fontification is very slow
Date: Tue, 26 Nov 2019 17:59:06 +0100
26 nov. 2019 kl. 17.26 skrev Dmitry Gutov <dgutov <at> yandex.ru>:

> Mattias, could you look into it?

Thanks, will have a look. By the way, when byte-compiling I get complaints about 'looking-back' being called with too few arguments in elixir-smie.el; may want to do something about that.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38104; Package emacs. (Tue, 26 Nov 2019 17:05:01 GMT) Full text and rfc822 format available.

Message #17 received at 38104 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 38104 <at> debbugs.gnu.org
Subject: Re: bug#38104: 27.0.50; elixir-mode fontification is very slow
Date: Tue, 26 Nov 2019 19:03:16 +0200
On 26.11.2019 18:59, Mattias Engdegård wrote:
> By the way, when byte-compiling I get complaints about 'looking-back' being called with too few arguments in elixir-smie.el; may want to do something about that.

Yes, I have a few other pending patches for that project as well.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38104; Package emacs. (Tue, 26 Nov 2019 19:33:01 GMT) Full text and rfc822 format available.

Message #20 received at 38104 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 38104 <at> debbugs.gnu.org
Subject: Re: bug#38104: 27.0.50; elixir-mode fontification is very slow
Date: Tue, 26 Nov 2019 20:32:29 +0100
26 nov. 2019 kl. 17.26 skrev Dmitry Gutov <dgutov <at> yandex.ru>:

> elixir-mode does use rx, heavily. Albeit with a thin wrapper.

As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el, we have

      (identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
                          (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
                          (optional (or "?" "!"))))

First, this regex is suboptimal: the first character of an identifier should occur exactly once, or you get bad backtracking behaviour. Just remove the one-or-more construct:

      (identifiers . ,(rx (any "A-Z" "a-z" "_")
                          (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
                          (optional (or "?" "!"))))

This definition is then used in several places, but two in particular are of interest to us:

    ;; Module attributes
    (,(elixir-rx (and "@" (1+ identifiers)))

The construct (1+ identifiers) was perhaps meant to match multiple identifiers, but it doesn't (no separator); it just matches an identifier in several ways, which again leads to bad backtracking behaviour.
The same problem here:

    ;; Map keys
    (,(elixir-rx (group (and (one-or-more identifiers) ":")) space)

Remove the 1+ and one-or-more and it's fast again.

Why did this "work" with the old rx implementation? Because that code had a nasty bug: it does not bracket definitions in rx-constituents properly. Example:

(let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
  (rx-to-string '(1+ hello) t))
=> "HELLO+"

The new rx implementation does not suffer from this bug.

The result in your case is that the old rx, when translating (1+ identifiers), only tacked the "+" onto whatever regexp 'identifiers' produced, resulting in

"[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"

which is a lot faster, since only the final [!?] is repeated twice (and it probably doesn't match very often).






Reply sent to Dmitry Gutov <dgutov <at> yandex.ru>:
You have taken responsibility. (Wed, 27 Nov 2019 21:59:01 GMT) Full text and rfc822 format available.

Notification sent to Dmitry Gutov <dgutov <at> yandex.ru>:
bug acknowledged by developer. (Wed, 27 Nov 2019 21:59:01 GMT) Full text and rfc822 format available.

Message #25 received at 38104-done <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 38104-done <at> debbugs.gnu.org
Subject: Re: bug#38104: 27.0.50; elixir-mode fontification is very slow
Date: Wed, 27 Nov 2019 23:58:46 +0200
Hi Mattias,

On 26.11.2019 21:32, Mattias Engdegård wrote:

> As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el, we have
> 
>        (identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
>                            (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
>                            (optional (or "?" "!"))))
> 
> First, this regex is suboptimal: the first character of an identifier should occur exactly once, or you get bad backtracking behaviour. Just remove the one-or-more construct:
> 
>        (identifiers . ,(rx (any "A-Z" "a-z" "_")
>                            (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
>                            (optional (or "?" "!"))))
> 
> This definition is then used in several places, but two in particular are of interest to us:
> 
>      ;; Module attributes
>      (,(elixir-rx (and "@" (1+ identifiers)))
> 
> The construct (1+ identifiers) was perhaps meant to match multiple identifiers, but it doesn't (no separator); it just matches an identifier in several ways, which again leads to bad backtracking behaviour.
> The same problem here:
> 
>      ;; Map keys
>      (,(elixir-rx (group (and (one-or-more identifiers) ":")) space)
> 
> Remove the 1+ and one-or-more and it's fast again.

That makes a lot of sense. I removed these one-or-more's and 1+ (and a 
few others), and it became fast again.

I'll send a patch upstream. Thanks for your help!

(Looking at the tracker, they have a minor version of this change 
submitted already).

> Why did this "work" with the old rx implementation? Because that code had a nasty bug: it does not bracket definitions in rx-constituents properly. Example:
> 
> (let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
>    (rx-to-string '(1+ hello) t))
> => "HELLO+"
> 
> The new rx implementation does not suffer from this bug.
> 
> The result in your case is that the old rx, when translating (1+ identifiers), only tacked the "+" onto whatever regexp 'identifiers' produced, resulting in
> 
> "[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"
> 
> which is a lot faster, since only the final [!?] is repeated twice (and it probably doesn't match very often).

It's funny to think how someone probably beaten the current code into 
submission by trial and error.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 26 Dec 2019 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 93 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.