GNU bug report logs - #36444
[PATCH] Improved regexp-opt KEEP-ORDER check

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Sun, 30 Jun 2019 12:30:02 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36444 in the body.
You can then email your comments to 36444 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#36444; Package emacs. (Sun, 30 Jun 2019 12:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattiase <at> acm.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 30 Jun 2019 12:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Improved regexp-opt KEEP-ORDER check
Date: Sun, 30 Jun 2019 14:28:57 +0200
[Message part 1 (text/plain, inline)]
Currently, regexp-opt does not attempt optimisation with KEEP-ORDER set if the input list contains a proper prefix of another element, like

 ("ab" "abcd")

on the grounds that the optimised string would be

 "ab\\(?:cd\\)?"

which would not preserve the match order. However, this also prevents

 ("abcd" "ab")

from being optimised, even though doing so would be harmless.

The attached patch strengthens the check, allowing more inputs to be optimised.

[0001-Optimise-more-inputs-to-regexp-opt.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36444; Package emacs. (Wed, 03 Jul 2019 19:30:02 GMT) Full text and rfc822 format available.

Message #8 received at 36444 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 36444 <at> debbugs.gnu.org
Subject: Re: bug#36444: [PATCH] Improved regexp-opt KEEP-ORDER check
Date: Wed, 03 Jul 2019 15:29:33 -0400
Mattias Engdegård <mattiase <at> acm.org> writes:

> +             ;; The algorithm will generate a pattern that matches
> +             ;; longer strings in the list before shorter.  If the
> +             ;; list order matters, then no string must come after a
> +             ;; proper prefix of that string.  To check this, verify
> +             ;; that a straight or-pattern matches each string
> +             ;; entirely.
> +             ((and keep-order
> +                   (let* ((case-fold-search nil)
> +                          (alts (mapconcat #'regexp-quote strings "\\|")))
> +                     (and (save-match-data

You don't actually need this save-match-data, right?  Because there is
already one at the top level of the function (which I'm also not sure is
really needed, but probably best not to touch that).

> +                            (let ((s strings))
> +                              (while (and s
> +                                          (string-match alts (car s))
> +                                          (= (match-end 0) (length (car s))))
> +                                (setq s (cdr s)))
> +                              s))
> +                          (concat (or open "\\(?:") alts "\\)")))))

IMO, a dolist + catch & throw would be a bit more readable; it took me
some puzzling to realize that the early exit was the "non-optimized"
case.

  (and keep-order
       (let* ((case-fold-search nil)
              (alts (mapconcat #'regexp-quote strings "\\|")))
         (and (catch 'has-prefix
                (dolist (s strings)
                  (unless (and (string-match alts s)
                               (= (match-end 0) (length s)))
                    (throw 'has-prefix s))))
              (concat (or open "\\(?:") alts "\\)"))))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36444; Package emacs. (Thu, 04 Jul 2019 11:53:01 GMT) Full text and rfc822 format available.

Message #11 received at 36444 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 36444 <at> debbugs.gnu.org
Subject: Re: bug#36444: [PATCH] Improved regexp-opt KEEP-ORDER check
Date: Thu, 4 Jul 2019 13:52:31 +0200
3 juli 2019 kl. 21.29 skrev Noam Postavsky <npostavs <at> gmail.com>:
> 
> You don't actually need this save-match-data, right?  Because there is
> already one at the top level of the function (which I'm also not sure is
> really needed, but probably best not to touch that).

Thank you! I don't know how I missed the existing save-match-data. Removed.

> IMO, a dolist + catch & throw would be a bit more readable; it took me
> some puzzling to realize that the early exit was the "non-optimized"
> case.
> 
>  (and keep-order
>       (let* ((case-fold-search nil)
>              (alts (mapconcat #'regexp-quote strings "\\|")))
>         (and (catch 'has-prefix
>                (dolist (s strings)
>                  (unless (and (string-match alts s)
>                               (= (match-end 0) (length s)))
>                    (throw 'has-prefix s))))
>              (concat (or open "\\(?:") alts "\\)"))))

Not too fond of that either, really; catch/throw somehow seems more heavyweight than warranted by the situation.
Initially I used cl-every, but ran into the usual bootstrap problems.
Here are two alternatives:

(1) Same as before, but with a comment about what tripped you up:

>         (and (let ((s strings))
>                (while (and s
>                            (string-match alts (car s))
>                            (= (match-end 0) (length (car s))))
>                  (setq s (cdr s)))
>                ;; If we exited early, we found evidence that
>                ;; regexp-opt-group cannot be used.
>                s)
>              (concat (or open "\\(?:") alts "\\)")))))

(2) Using cl-loop:

>         (and (not (cl-loop
>                    for s in strings
>                    always (and (string-match alts s)
>                                (= (match-end 0) (length s)))))
>              (concat (or open "\\(?:") alts "\\)")))))






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36444; Package emacs. (Thu, 04 Jul 2019 14:19:01 GMT) Full text and rfc822 format available.

Message #14 received at 36444 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Noam Postavsky <npostavs <at> gmail.com>, 36444 <at> debbugs.gnu.org
Subject: Re: bug#36444: [PATCH] Improved regexp-opt KEEP-ORDER check
Date: Thu, 04 Jul 2019 10:18:11 -0400
Mattias Engdegård <mattiase <at> acm.org> writes:

> Not too fond of that either, really; catch/throw somehow seems more
> heavyweight than warranted by the situation.

I've wondered if it's worth making a lexical variant of catch/throw that
could be compiled as goto for these kind of situations.

> (1) Same as before, but with a comment about what tripped you up:

> (2) Using cl-loop:

Assuming (eval-when-compile (require 'cl-lib)) avoids bootstapping
problems, I think the cl-loop variant is a bit neater; but either way is
fine with me.





Reply sent to Mattias Engdegård <mattiase <at> acm.org>:
You have taken responsibility. (Thu, 04 Jul 2019 15:19:01 GMT) Full text and rfc822 format available.

Notification sent to Mattias Engdegård <mattiase <at> acm.org>:
bug acknowledged by developer. (Thu, 04 Jul 2019 15:19:02 GMT) Full text and rfc822 format available.

Message #19 received at 36444-done <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 36444-done <at> debbugs.gnu.org
Subject: Re: bug#36444: [PATCH] Improved regexp-opt KEEP-ORDER check
Date: Thu, 4 Jul 2019 17:18:40 +0200
4 juli 2019 kl. 16.18 skrev Noam Postavsky <npostavs <at> gmail.com>:
> 
>> Not too fond of that either, really; catch/throw somehow seems more
>> heavyweight than warranted by the situation.
> 
> I've wondered if it's worth making a lexical variant of catch/throw that
> could be compiled as goto for these kind of situations.

Yes, although in this case I'd settle for built-in some/every constructs without bootstrap trouble, or list comprehensions that aren't a mysterious corner of cl-loop.

>> (1) Same as before, but with a comment about what tripped you up:
> 
>> (2) Using cl-loop:
> 
> Assuming (eval-when-compile (require 'cl-lib)) avoids bootstapping
> problems, I think the cl-loop variant is a bit neater; but either way is
> fine with me.

Thank you; I went with the while-loop, on the grounds that it can be readily understood by the layman from first principles. I have yet to be entirely friends with cl-loop.
Pushed to master.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36444; Package emacs. (Thu, 04 Jul 2019 16:02:01 GMT) Full text and rfc822 format available.

Message #22 received at 36444-done <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Mattias Engdegård <mattiase <at> acm.org>, Noam
 Postavsky <npostavs <at> gmail.com>
Cc: 36444-done <at> debbugs.gnu.org
Subject: RE: bug#36444: [PATCH] Improved regexp-opt KEEP-ORDER check
Date: Thu, 4 Jul 2019 09:01:39 -0700 (PDT)
FWIW:

+1 for idiom `catch-throw' + `dolist' - clear to all Lispers.
-1 for `cl-loop' here (unlispy language).
 0 for `while' loop, `and' exit, and `setq' cdring (obfuscates a bit).




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 02 Aug 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 262 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.