GNU bug report logs - #13369
24.1; compile message parsing slow because of omake hack

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> bredband.net>

Date: Sun, 6 Jan 2013 20:05:02 UTC

Severity: normal

Merged with 3700, 9065, 29554

Found in versions 24.0.50, 24.1, 25.3

To reply to this bug, email your comments to 13369 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Sun, 06 Jan 2013 20:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattiase <at> bredband.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 06 Jan 2013 20:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.1; compile message parsing slow because of omake hack
Date: Sun, 6 Jan 2013 21:03:05 +0100
Parsing compilation messages in compilation-mode can be very slow for
large buffers (thousands of error lines); it can take many
seconds. Experiments show that it is the presence of omake in
compilation-error-regexp-alist that causes most of the trouble; removing
it mostly cures the problem.

The omake regexp does not look too troublesome, but there are some
omake-specific hacks in compile.el that are more worrying. In
particular, this code (in compilation-parse-errors) looks suspicious:

      (cond
       ((not (memq 'omake compilation-error-regexp-alist)) nil)
       ((string-match "\\`\\([^^]\\|^\\( \\*\\|\\[\\)\\)" pat)
        nil) ;; Not anchored or anchored but already allows empty  
spaces.
       (t (setq pat (concat "^ *" (substring pat 1)))))

The slightly alarming concept of regexp-matching a regexp aside, this
one doesn't make sense - shouldn't the ^ (following the \|) be escaped?
Apparently the code was at some time changed from

  (when (and (= ?^ (aref pat 0)) ; anchored: starts with "^"
             ;; but does not allow an arbitrary number of leading  
spaces
             (not (and (= ?  (aref pat 1)) (= ?* (aref pat 2)))))

which looks more correct, and conveys the intent somewhat better
(and may be more efficient than the regexp for all I know).

It's not clear to me how the present code could ever have worked.
At the very least the regexp in compilation-parse-errors should
be fixed.

In GNU Emacs 24.1.1 (powerpc-apple-darwin, NS apple-appkit-1038.36)
 of 2012-06-10 on bob.porkrind.org
Windowing system distributor `Apple', version 10.3.949





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Mon, 07 Jan 2013 01:25:02 GMT) Full text and rfc822 format available.

Message #8 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Sun, 06 Jan 2013 20:24:36 -0500
Mattias Engdegård wrote:

> Parsing compilation messages in compilation-mode can be very slow for
> large buffers (thousands of error lines); it can take many
> seconds. Experiments show that it is the presence of omake in
> compilation-error-regexp-alist that causes most of the trouble; removing
> it mostly cures the problem.
[...]
> one doesn't make sense - shouldn't the ^ (following the \|) be escaped?

Yes, I think so.

Does making that change remove the slowdown that you see?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Mon, 07 Jan 2013 02:14:01 GMT) Full text and rfc822 format available.

Message #11 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Mon, 7 Jan 2013 02:41:21 +0100
7 jan 2013 kl. 02.24 skrev Glenn Morris:
> Does making that change remove the slowdown that you see?

Substantially, but not entirely. (I can try measuring it exactly if  
you want it quantified, but it goes from being unusable to merely  
annoyingly sluggish.)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Mon, 07 Jan 2013 08:15:03 GMT) Full text and rfc822 format available.

Message #14 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Mon, 07 Jan 2013 03:14:28 -0500
Mattias Engdegård wrote:

> Substantially, but not entirely. (I can try measuring it exactly if
> you want it quantified, but it goes from being unusable to merely
> annoyingly sluggish.)

It might be useful to have some numbers, yes.
Could you compare the time with the \\^ change to the time with the
omake part of compilation-parse-errors commented out entirely?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Mon, 07 Jan 2013 21:51:02 GMT) Full text and rfc822 format available.

Message #17 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Mon, 7 Jan 2013 22:50:20 +0100
7 jan 2013 kl. 09.14 skrev Glenn Morris:

> Could you compare the time with the \\^ change to the time with the
> omake part of compilation-parse-errors commented out entirely?

Here are the times, in seconds, for executing compilation-parse-errors
far down a large compile buffer (5000 lines, or about 440 KiB),
with and without omake present in compilation-error-regexp-alist:

omake
present    absent
30.3        3.2     Standard code
 6.5        3.2     repaired regexp (escaped ^)
 3.2        3.2     COND expression removed

In the last case, the entire COND surrounding the faulty regexp was
edited out.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Tue, 08 Jan 2013 20:15:02 GMT) Full text and rfc822 format available.

Message #20 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Tue, 08 Jan 2013 15:14:21 -0500
Mattias Engdegård wrote:

> Here are the times, in seconds, for executing compilation-parse-errors
> far down a large compile buffer (5000 lines, or about 440 KiB),
> with and without omake present in compilation-error-regexp-alist:
>
> omake
> present    absent
> 30.3        3.2     Standard code
>  6.5        3.2     repaired regexp (escaped ^)
>  3.2        3.2     COND expression removed

Thanks. Could you also give the numbers for
compilation-error-regexp-alist containing only `gnu' (assuming that is
the one that is relevant for your test case)?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Tue, 08 Jan 2013 21:41:02 GMT) Full text and rfc822 format available.

Message #23 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Tue, 8 Jan 2013 22:09:52 +0100
8 jan 2013 kl. 21.14 skrev Glenn Morris:

> Thanks. Could you also give the numbers for
> compilation-error-regexp-alist containing only `gnu' (assuming that is
> the one that is relevant for your test case)?

These times are with a slightly different compilation buffer:

  all   no omake   gnu only
 32.7     3.4        0.3      standard code
  6.8     3.4        0.3      repaired regexp (escaped ^)
  3.4     3.4        0.3      COND expression removed






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Tue, 08 Jan 2013 22:42:02 GMT) Full text and rfc822 format available.

Message #26 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Tue, 08 Jan 2013 17:40:57 -0500
Mattias Engdegård wrote:

> 8 jan 2013 kl. 21.14 skrev Glenn Morris:
>
>> Thanks. Could you also give the numbers for
>> compilation-error-regexp-alist containing only `gnu' (assuming that is
>> the one that is relevant for your test case)?
>
> These times are with a slightly different compilation buffer:
>
>   all   no omake   gnu only
>  32.7     3.4        0.3      standard code
>   6.8     3.4        0.3      repaired regexp (escaped ^)
>   3.4     3.4        0.3      COND expression removed

OK, thank you. So having fixed the omake ^ issue, basically to me it
just seems to be the case that the more entries are in
compilation-error-regexp-alist, the slower things get.

Maybe we should encourage people to prune it to only the entries they
use, or maybe some less common elements should not be there by default.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Wed, 09 Jan 2013 01:48:01 GMT) Full text and rfc822 format available.

Message #29 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Glenn Morris <rgm <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> bredband.net>,
	13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Tue, 08 Jan 2013 20:47:09 -0500
>>> Thanks. Could you also give the numbers for
>>> compilation-error-regexp-alist containing only `gnu' (assuming that is
>>> the one that is relevant for your test case)?
>> These times are with a slightly different compilation buffer:
>> all   no omake   gnu only
>> 32.7     3.4        0.3      standard code
>> 6.8     3.4        0.3      repaired regexp (escaped ^)
>> 3.4     3.4        0.3      COND expression removed
> OK, thank you. So having fixed the omake ^ issue, basically to me it
> just seems to be the case that the more entries are in
> compilation-error-regexp-alist, the slower things get.
> Maybe we should encourage people to prune it to only the entries they
> use, or maybe some less common elements should not be there by default.

Yes, every entry costs time, which is why I've been resisting adding
more entries and would rather push the problem upstream to convince the
tools's authors to stick to the standard GNU message format.

I think compile.el would benefit from a different regex engine where we
could do a lex-style union of all regexp into a single automaton.


        Stefan




Merged 3700 9065 13369. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 09 Jan 2013 02:00:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Wed, 09 Jan 2013 11:13:02 GMT) Full text and rfc822 format available.

Message #34 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Glenn Morris <rgm <at> gnu.org>, 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Wed, 9 Jan 2013 12:11:33 +0100
9 jan 2013 kl. 02.47 skrev Stefan Monnier:

>>> all   no omake   gnu only
>>> 32.7     3.4        0.3      standard code
>>> 6.8     3.4        0.3      repaired regexp (escaped ^)
>>> 3.4     3.4        0.3      COND expression removed
>> OK, thank you. So having fixed the omake ^ issue, basically to me it
>> just seems to be the case that the more entries are in
>> compilation-error-regexp-alist, the slower things get.
>> Maybe we should encourage people to prune it to only the entries they
>> use, or maybe some less common elements should not be there by  
>> default.
>
> Yes, every entry costs time, which is why I've been resisting adding
> more entries and would rather push the problem upstream to convince  
> the
> tools's authors to stick to the standard GNU message format.

Note however that the omake is still special - while its own regexp is
fast and simple, its mere presence in the list causes the remaining
parsing to become twice as slow (as seen from the measurements above).
I'm also still somewhat suspicious of how the hack mutilates other
regexps in ways that may change their meaning.

In addition to fixing the regexp, I suggest omake be disabled by
default because of its impact and since it's somewhat of a special need.

> I think compile.el would benefit from a different regex engine where  
> we
> could do a lex-style union of all regexp into a single automaton.

That would be nice, especially if the result could be a DFA.
I would also suggest switching to rx notation for the regexps.
(The ^ quoting bug is one that would never have occurred with rx,
and that is a very small regexp.)

I actually wrote a simple regexp-to-rx translator, like rx in reverse,
just to be able to make sense of the ones in compile.el. I'd be happy
to share.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Wed, 09 Jan 2013 13:44:02 GMT) Full text and rfc822 format available.

Message #37 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Jambunathan K <kjambunathan <at> gmail.com>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Wed, 09 Jan 2013 19:12:41 +0530
Mattias Engdegård <mattiase <at> bredband.net> writes:

> I actually wrote a simple regexp-to-rx translator, like rx in reverse,
> just to be able to make sense of the ones in compile.el. I'd be happy
> to share.

Why not just share, instead of saying that you will be happy to do so.

I personally find rx easy to edit and use.  I am also drifting away from
Emacs lisp regexp to rx.

ps: Someone shared a perl(?)-to-Emacs regexes a couple of months ago and
wanted to include it as part of GNU ELPA.
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Wed, 09 Jan 2013 14:32:04 GMT) Full text and rfc822 format available.

Message #40 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: Jambunathan K <kjambunathan <at> gmail.com>
Cc: 13369 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Wed, 9 Jan 2013 15:31:06 +0100
[Message part 1 (text/plain, inline)]
> Why not just share, instead of saying that you will be happy to do so.

Sorry, I just assumed that someone already wrote such a thing and that
it would be more polished than my amateurish attempt. Here it is.
[xr.el (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Wed, 09 Jan 2013 15:18:02 GMT) Full text and rfc822 format available.

Message #43 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Jambunathan K <kjambunathan <at> gmail.com>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Wed, 09 Jan 2013 20:47:08 +0530

Mattias Engdegård <mattiase <at> bredband.net> writes:

Thanks, that was quick.  May be you want to indicate whether you want to
assign the copyright to that code FSF so that it could be improved upon
by others and distributed with Emacs or GNU ELPA.

>> Why not just share, instead of saying that you will be happy to do so.
>
> Sorry, I just assumed that someone already wrote such a thing 

[OT, The following comment concerns re-builder]

In re-builder, there is a way to convert between various regexp styles.
It is bound to C-c TAB by default.  It is not clear to me, whether
re-builder supports rx-to-regexp conversions.

When I try converting the following regexp (C-h v org-heading-regexp) in
read format to rx format

        "^\\(\\*+\\)\\(?: +\\(.*?\\)\\)?[ \t]*$"

I am seeing that the re-builder translates that to 

    ,----
    | '()
    `----

with the following message 

    ,----
    | rx-form: Unknown rx form `nil'
    `----

I am not sure whether that counts as bug.  It is possible that
re-builder doesn't support such translation or that I am using the
interface wrongly.

While, 

        (xr "^\\(\\*+\\)\\(?: +\\(.*?\\)\\)?[ \t]*$"))

gives me

    (seq bol
         (group
          (one-or-more "*"))
         (opt
          (one-or-more " ")
          (group
           (minimal-match
            (zero-or-more nonl))))
         (zero-or-more
          (any "	" " "))
         eol)

> and that it would be more polished than my amateurish attempt. Here it
> is.

I will let others review the changes.  

Some libraries like org.el use complex regexps.  For someone who wants
to dig deep in to what the regexps amount to, without resorting to
pen-and-paper, one can imagine a utility which overlays or tooltips a
regexp like string with it's rx counterpart.  It could be quite useful.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Wed, 09 Jan 2013 20:23:02 GMT) Full text and rfc822 format available.

Message #46 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: Glenn Morris <rgm <at> gnu.org>, 13369 <at> debbugs.gnu.org
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Wed, 09 Jan 2013 15:20:54 -0500
> I actually wrote a simple regexp-to-rx translator, like rx in reverse,
> just to be able to make sense of the ones in compile.el.  I'd be happy
> to share.

Reminds me of my old lex.el, so I've just added it to GNU ELPA.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Thu, 10 Jan 2013 18:56:02 GMT) Full text and rfc822 format available.

Message #49 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> bredband.net>
To: Jambunathan K <kjambunathan <at> gmail.com>
Cc: 13369 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Thu, 10 Jan 2013 19:55:15 +0100
9 jan 2013 kl. 16.17 skrev Jambunathan K:

> Thanks, that was quick.  May be you want to indicate whether you  
> want to
> assign the copyright to that code FSF so that it could be improved  
> upon
> by others and distributed with Emacs or GNU ELPA.

Thank you, but I doubt I could get my employer to sign any copyright
papers, which to the best of my understanding is required for
distribution with Emacs. Please correct me if I'm wrong.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13369; Package emacs. (Thu, 10 Jan 2013 19:35:02 GMT) Full text and rfc822 format available.

Message #52 received at 13369 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias Engdegård <mattiase <at> bredband.net>
Cc: 13369 <at> debbugs.gnu.org, Jambunathan K <kjambunathan <at> gmail.com>
Subject: Re: bug#13369: 24.1;
	compile message parsing slow because of omake hack
Date: Thu, 10 Jan 2013 14:34:26 -0500
>> Thanks, that was quick.  May be you want to indicate whether you want to
>> assign the copyright to that code FSF so that it could be improved upon
>> by others and distributed with Emacs or GNU ELPA.
> Thank you, but I doubt I could get my employer to sign any copyright
> papers, which to the best of my understanding is required for
> distribution with Emacs. Please correct me if I'm wrong.

Indeed, it's needed, but only very few employers really refuse to sign
the relevant paperwork (which is a disclaimer that they have no
copyright interest in your work on Emacs).

Many employers will need some convincing (and reminding), but if I were
you I wouldn't give up just on the assumption that it can't be done,


        Stefan




Merged 3700 9065 13369 29554. Request was from Noam Postavsky <npostavs <at> users.sourceforge.net> to control <at> debbugs.gnu.org. (Tue, 05 Dec 2017 00:30:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 165 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.