GNU bug report logs -
#36444
[PATCH] Improved regexp-opt KEEP-ORDER check
Previous Next
Reported by: Mattias Engdegård <mattiase <at> acm.org>
Date: Sun, 30 Jun 2019 12:30:02 UTC
Severity: normal
Tags: patch
Done: Mattias Engdegård <mattiase <at> acm.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36444 in the body.
You can then email your comments to 36444 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#36444
; Package
emacs
.
(Sun, 30 Jun 2019 12:30:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 30 Jun 2019 12:30:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Currently, regexp-opt does not attempt optimisation with KEEP-ORDER set if the input list contains a proper prefix of another element, like
("ab" "abcd")
on the grounds that the optimised string would be
"ab\\(?:cd\\)?"
which would not preserve the match order. However, this also prevents
("abcd" "ab")
from being optimised, even though doing so would be harmless.
The attached patch strengthens the check, allowing more inputs to be optimised.
[0001-Optimise-more-inputs-to-regexp-opt.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#36444
; Package
emacs
.
(Wed, 03 Jul 2019 19:30:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 36444 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattiase <at> acm.org> writes:
> + ;; The algorithm will generate a pattern that matches
> + ;; longer strings in the list before shorter. If the
> + ;; list order matters, then no string must come after a
> + ;; proper prefix of that string. To check this, verify
> + ;; that a straight or-pattern matches each string
> + ;; entirely.
> + ((and keep-order
> + (let* ((case-fold-search nil)
> + (alts (mapconcat #'regexp-quote strings "\\|")))
> + (and (save-match-data
You don't actually need this save-match-data, right? Because there is
already one at the top level of the function (which I'm also not sure is
really needed, but probably best not to touch that).
> + (let ((s strings))
> + (while (and s
> + (string-match alts (car s))
> + (= (match-end 0) (length (car s))))
> + (setq s (cdr s)))
> + s))
> + (concat (or open "\\(?:") alts "\\)")))))
IMO, a dolist + catch & throw would be a bit more readable; it took me
some puzzling to realize that the early exit was the "non-optimized"
case.
(and keep-order
(let* ((case-fold-search nil)
(alts (mapconcat #'regexp-quote strings "\\|")))
(and (catch 'has-prefix
(dolist (s strings)
(unless (and (string-match alts s)
(= (match-end 0) (length s)))
(throw 'has-prefix s))))
(concat (or open "\\(?:") alts "\\)"))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#36444
; Package
emacs
.
(Thu, 04 Jul 2019 11:53:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 36444 <at> debbugs.gnu.org (full text, mbox):
3 juli 2019 kl. 21.29 skrev Noam Postavsky <npostavs <at> gmail.com>:
>
> You don't actually need this save-match-data, right? Because there is
> already one at the top level of the function (which I'm also not sure is
> really needed, but probably best not to touch that).
Thank you! I don't know how I missed the existing save-match-data. Removed.
> IMO, a dolist + catch & throw would be a bit more readable; it took me
> some puzzling to realize that the early exit was the "non-optimized"
> case.
>
> (and keep-order
> (let* ((case-fold-search nil)
> (alts (mapconcat #'regexp-quote strings "\\|")))
> (and (catch 'has-prefix
> (dolist (s strings)
> (unless (and (string-match alts s)
> (= (match-end 0) (length s)))
> (throw 'has-prefix s))))
> (concat (or open "\\(?:") alts "\\)"))))
Not too fond of that either, really; catch/throw somehow seems more heavyweight than warranted by the situation.
Initially I used cl-every, but ran into the usual bootstrap problems.
Here are two alternatives:
(1) Same as before, but with a comment about what tripped you up:
> (and (let ((s strings))
> (while (and s
> (string-match alts (car s))
> (= (match-end 0) (length (car s))))
> (setq s (cdr s)))
> ;; If we exited early, we found evidence that
> ;; regexp-opt-group cannot be used.
> s)
> (concat (or open "\\(?:") alts "\\)")))))
(2) Using cl-loop:
> (and (not (cl-loop
> for s in strings
> always (and (string-match alts s)
> (= (match-end 0) (length s)))))
> (concat (or open "\\(?:") alts "\\)")))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#36444
; Package
emacs
.
(Thu, 04 Jul 2019 14:19:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 36444 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattiase <at> acm.org> writes:
> Not too fond of that either, really; catch/throw somehow seems more
> heavyweight than warranted by the situation.
I've wondered if it's worth making a lexical variant of catch/throw that
could be compiled as goto for these kind of situations.
> (1) Same as before, but with a comment about what tripped you up:
> (2) Using cl-loop:
Assuming (eval-when-compile (require 'cl-lib)) avoids bootstapping
problems, I think the cl-loop variant is a bit neater; but either way is
fine with me.
Reply sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
You have taken responsibility.
(Thu, 04 Jul 2019 15:19:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
bug acknowledged by developer.
(Thu, 04 Jul 2019 15:19:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 36444-done <at> debbugs.gnu.org (full text, mbox):
4 juli 2019 kl. 16.18 skrev Noam Postavsky <npostavs <at> gmail.com>:
>
>> Not too fond of that either, really; catch/throw somehow seems more
>> heavyweight than warranted by the situation.
>
> I've wondered if it's worth making a lexical variant of catch/throw that
> could be compiled as goto for these kind of situations.
Yes, although in this case I'd settle for built-in some/every constructs without bootstrap trouble, or list comprehensions that aren't a mysterious corner of cl-loop.
>> (1) Same as before, but with a comment about what tripped you up:
>
>> (2) Using cl-loop:
>
> Assuming (eval-when-compile (require 'cl-lib)) avoids bootstapping
> problems, I think the cl-loop variant is a bit neater; but either way is
> fine with me.
Thank you; I went with the while-loop, on the grounds that it can be readily understood by the layman from first principles. I have yet to be entirely friends with cl-loop.
Pushed to master.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#36444
; Package
emacs
.
(Thu, 04 Jul 2019 16:02:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 36444-done <at> debbugs.gnu.org (full text, mbox):
FWIW:
+1 for idiom `catch-throw' + `dolist' - clear to all Lispers.
-1 for `cl-loop' here (unlispy language).
0 for `while' loop, `and' exit, and `setq' cdring (obfuscates a bit).
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 02 Aug 2019 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 262 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.