Package: emacs;
Reported by: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 24 Oct 2025 21:53:02 UTC
Severity: normal
Tags: patch
To reply to this bug, email your comments to 79693 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
juri <at> linkov.net, monnier <at> iro.umontreal.ca, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Fri, 24 Oct 2025 21:53:02 GMT) Full text and rfc822 format available.Spencer Baugh <sbaugh <at> janestreet.com>:juri <at> linkov.net, monnier <at> iro.umontreal.ca, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org.
(Fri, 24 Oct 2025 21:53:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: bug-gnu-emacs <at> gnu.org Subject: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Fri, 24 Oct 2025 17:52:38 -0400
[Message part 1 (text/plain, inline)]
Tags: patch For example, before my patch, the following in the Emacs repo takes 4 seconds on my machine, and afterwards it takes 5 milliseconds. (progn (setq my-files (project-files (project-current))) (benchmark-run 1 (completion-pcm-all-completions "*/vc" my-files nil 0))) In GNU Emacs 30.1.90 (build 65, x86_64-pc-linux-gnu, X toolkit, cairo version 1.15.12, Xaw scroll bars) of 2025-10-23 built on igm-qws-u22796a Repository revision: 36f12b133cdf7966901a2a28c96ff358658f545a Repository branch: emacs-30 Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Rocky Linux 8.10 (Green Obsidian) Configured using: 'configure --with-x-toolkit=lucid --without-gpm --without-gconf --without-selinux --without-imagemagick --with-modules --with-gif=no --with-cairo --with-rsvg --without-compress-install --with-tree-sitter --with-native-compilation=aot PKG_CONFIG_PATH=/usr/local/home/garnish/libtree-sitter/0.22.6-1/lib/pkgconfig/'
[0001-Optimize-PCM-regex-to-not-contain-adjacent-wildcards.patch (text/patch, attachment)]
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Sat, 25 Oct 2025 08:26:02 GMT) Full text and rfc822 format available.Message #8 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, juri <at> linkov.net Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Sat, 25 Oct 2025 11:25:24 +0300
> Cc: Juri Linkov <juri <at> linkov.net>, Stefan Monnier <monnier <at> iro.umontreal.ca>, > Dmitry Gutov <dmitry <at> gutov.dev> > Date: Fri, 24 Oct 2025 17:52:38 -0400 > From: Spencer Baugh via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org> > > For example, before my patch, the following in the Emacs repo takes 4 > seconds on my machine, and afterwards it takes 5 milliseconds. > (progn > (setq my-files (project-files (project-current))) > (benchmark-run 1 (completion-pcm-all-completions "*/vc" my-files nil 0))) The performance gain in that case is indeed impressive, but how do we verify we don't get regressions in some other cases? Rgexps are tricky, and we don't seem to have enough tests for this.
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 01:59:02 GMT) Full text and rfc822 format available.Message #11 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Stefan Monnier <monnier <at> iro.umontreal.ca> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79693 <at> debbugs.gnu.org, Juri Linkov <juri <at> linkov.net> Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Sun, 26 Oct 2025 21:58:15 -0400
> @@ -4291,36 +4291,64 @@ completion-pcm--optimize-pattern
> (_ (push (pop p) n))))
> (nreverse n)))
>
> +(defun completion-pcm--pattern->segments (pattern)
> + "Segment PATTERN into more structured sublists.
> +
> +Returns a list of lists which when concatenated is semantically the same
> +as PATTERN.
> +
> +The first element in each sublist is a (possibly empty) string. The
> +remaining elements in the sublist are all wildcard symbols. If PATTERN
> +ends with a wildcard, then each sublist is guaranteed to have at least
> +one wildcard. Each sublist either contains no `all-delim' wildcards, or
> +only `all-delim' wildcards; this is useful since `all-delim' wildcards
> +need to be matched by a different regex than other wildcards."
`all-delim` => `any-delim`
> + (let (ret)
> + (while pattern
> + (let ((fixed "")
> + wildcards)
> + ;; Pop strings from PATTERN and concatenate them.
> + (while (stringp (car-safe pattern))
> + (setq fixed (concat fixed (pop pattern))))
> + ;; Pop wildcards of the same type from PATTERN.
> + (if (eq (car-safe pattern) 'any-delim)
> + (while (eq (car-safe pattern) 'any-delim)
> + (push (pop pattern) wildcards))
> + (while (and pattern
> + (symbolp (car-safe pattern))
> + (not (eq (car-safe pattern) 'any-delim)))
> + (push (pop pattern) wildcards)))
> + ;; The sublist is a fixed string followed by all the wildcards.
> + (push (cons fixed (nreverse wildcards)) ret)))
> + (nreverse ret)))
IIUC with a pattern like ("FOO" any-delim any), we generate a result of
the form
(("FOO" any-delim)
("" any))
Wouldn't it make more sense to simplify it to
(("FOO" any-delim any))
which can turn into a regexp like "FOO[z-a]*"?
> +(defun completion-pcm--segments->regex (segments &optional group)
> + (concat "\\`"
> + (mapconcat
> + (lambda (segment)
> + (concat
> + (regexp-quote (car segment))
> + (when group "\\(")
> + (if (eq (cadr segment) 'any-delim)
> + (concat completion-pcm--delim-wild-regex "*?")
> + "[^z-a]*?")
> + (when group "\\)")))
> + segments
> + "")))
With the above, I guess the idea here would be to check (all (lambda (x)
(eq x 'any-delim)) (cdr segment)) instead of (eq (cadr segment)
'any-delim).
I guess you thought of that but it has problems, in which case please
add a comment in the code explaining the problem.
> +(defun completion-pcm--segments-point-idx (segments)
> + "Return index of subgroup corresponding to `point' element of SEGMENTS.
> +Return nil if there's no such element.
> +This is used in combination with `completion-pcm--segments->regex'."
> (let ((idx nil)
> (i 0))
> - (dolist (x pattern)
> - (unless (stringp x)
> - (incf i)
> - (if (eq x 'point) (setq idx i))))
> + (dolist (x segments)
> + (incf i)
> + (when (memq 'point (cdr x))
> + (setq idx i)))
> idx))
Hmm... so IIUC we don't distinguish segments of the form (STR point star)
from segments of the form (STR star point), right?
I'm wondering whether it's OK, or whether on the contrary we should be
careful about it because it affects the final position of point.
Stefan
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 16:59:02 GMT) Full text and rfc822 format available.Message #14 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: Dmitry Gutov <dmitry <at> gutov.dev>, Juri Linkov <juri <at> linkov.net>, 79693 <at> debbugs.gnu.org Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 12:57:58 -0400
[Message part 1 (text/plain, inline)]
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
>> @@ -4291,36 +4291,64 @@ completion-pcm--optimize-pattern
>> (_ (push (pop p) n))))
>> (nreverse n)))
>>
>> +(defun completion-pcm--pattern->segments (pattern)
>> + "Segment PATTERN into more structured sublists.
>> +
>> +Returns a list of lists which when concatenated is semantically the same
>> +as PATTERN.
>> +
>> +The first element in each sublist is a (possibly empty) string. The
>> +remaining elements in the sublist are all wildcard symbols. If PATTERN
>> +ends with a wildcard, then each sublist is guaranteed to have at least
>> +one wildcard. Each sublist either contains no `all-delim' wildcards, or
>> +only `all-delim' wildcards; this is useful since `all-delim' wildcards
>> +need to be matched by a different regex than other wildcards."
>
> `all-delim` => `any-delim`
(Text removed in the next version)
>> + (let (ret)
>> + (while pattern
>> + (let ((fixed "")
>> + wildcards)
>> + ;; Pop strings from PATTERN and concatenate them.
>> + (while (stringp (car-safe pattern))
>> + (setq fixed (concat fixed (pop pattern))))
>> + ;; Pop wildcards of the same type from PATTERN.
>> + (if (eq (car-safe pattern) 'any-delim)
>> + (while (eq (car-safe pattern) 'any-delim)
>> + (push (pop pattern) wildcards))
>> + (while (and pattern
>> + (symbolp (car-safe pattern))
>> + (not (eq (car-safe pattern) 'any-delim)))
>> + (push (pop pattern) wildcards)))
>> + ;; The sublist is a fixed string followed by all the wildcards.
>> + (push (cons fixed (nreverse wildcards)) ret)))
>> + (nreverse ret)))
>
> IIUC with a pattern like ("FOO" any-delim any), we generate a result of
> the form
>
> (("FOO" any-delim)
> ("" any))
>
> Wouldn't it make more sense to simplify it to
>
> (("FOO" any-delim any))
>
> which can turn into a regexp like "FOO[z-a]*"?
Yes, good catch! That's both faster to run and simpler to code. Done
in the attached patch.
>> +(defun completion-pcm--segments->regex (segments &optional group)
>> + (concat "\\`"
>> + (mapconcat
>> + (lambda (segment)
>> + (concat
>> + (regexp-quote (car segment))
>> + (when group "\\(")
>> + (if (eq (cadr segment) 'any-delim)
>> + (concat completion-pcm--delim-wild-regex "*?")
>> + "[^z-a]*?")
>> + (when group "\\)")))
>> + segments
>> + "")))
>
> With the above, I guess the idea here would be to check (all (lambda (x)
> (eq x 'any-delim)) (cdr segment)) instead of (eq (cadr segment)
> 'any-delim).
>
> I guess you thought of that but it has problems, in which case please
> add a comment in the code explaining the problem.
Nope, I just didn't think of it. :)
>> +(defun completion-pcm--segments-point-idx (segments)
>> + "Return index of subgroup corresponding to `point' element of SEGMENTS.
>> +Return nil if there's no such element.
>> +This is used in combination with `completion-pcm--segments->regex'."
>> (let ((idx nil)
>> (i 0))
>> - (dolist (x pattern)
>> - (unless (stringp x)
>> - (incf i)
>> - (if (eq x 'point) (setq idx i))))
>> + (dolist (x segments)
>> + (incf i)
>> + (when (memq 'point (cdr x))
>> + (setq idx i)))
>> idx))
>
> Hmm... so IIUC we don't distinguish segments of the form (STR point star)
> from segments of the form (STR star point), right?
> I'm wondering whether it's OK, or whether on the contrary we should be
> careful about it because it affects the final position of point.
True, we do need to be careful about it. In fact, this code was already
buggy before my change: if there was a * before point, we'd highlight
the wrong characters as the first difference.
But since this bug was pre-existing, probably I should fix it in a
separate patch. So for now I've just added a test showing the bug.
(Since it only affects highlighting, it's not that critical)
[0001-Optimize-PCM-regex-to-not-contain-adjacent-wildcards.patch (text/x-patch, inline)]
From bc9e8f8d715862ca799602f015c42d25b98d9d86 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 24 Oct 2025 15:58:44 -0400
Subject: [PATCH] Optimize PCM regex to not contain adjacent wildcards
When multiple wildcards occur in a PCM pattern,
completion-pcm--pattern->regex previously would generate one
instance of [^z-a]* for each of those wildcards, even if the
wildcards were adjacent and could therefore be matched by a
single [^z-a]*. This can make regex matching performance much
worse. For example, with a minibuffer containing "*/" with
point at the start, completion-pcm-all-completions would take
several seconds to match in project-find-file in the Emacs repo.
Now, we run completion-pcm--pattern->segments on the pattern
first, which adds additional structure to the pattern, including
consolidating adjacent regexes into a single sublist. Then
completion-pcm--segments->regex generates a single regex
wildcard for each of those pattern wildcards. As a consequence,
we need to update the callers of completion-pcm--pattern->regex
which pass non-nil GROUP. This provides a substantial
performance improvement in various edge cases.
* lisp/minibuffer.el (completion-pcm--pattern->segments)
(completion-pcm--segments->regex): Add. (bug#79693)
(completion-pcm--pattern->regex)
(completion-pcm--merge-completions): Call pattern->segments and
segments->regex.
(completion-pcm--pattern-point-idx): Delete.
(completion-pcm--segments-point-idx): Add.
(completion-pcm--hilit-commonality): Call
completion-pcm--segments-point-idx to find the submatch
containing point.
* test/lisp/minibuffer-tests.el (completion-pcm-test-5): Add
more tests of highlighting the first difference.
---
lisp/minibuffer.el | 110 +++++++++++++++++++---------------
test/lisp/minibuffer-tests.el | 24 +++++++-
2 files changed, 84 insertions(+), 50 deletions(-)
diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index 9a78f631e02..678e4b1f104 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -4291,36 +4291,57 @@ completion-pcm--optimize-pattern
(_ (push (pop p) n))))
(nreverse n)))
+(defun completion-pcm--pattern->segments (pattern)
+ "Segment PATTERN into more structured sublists.
+
+Returns a list of lists which when concatenated is semantically the same
+as PATTERN.
+
+The first element in each sublist is a (possibly empty) string. The
+remaining elements in the sublist are all wildcard symbols. If PATTERN
+ends with a wildcard, then each sublist is guaranteed to have at least
+one wildcard."
+ (let (ret)
+ (while pattern
+ (let ((fixed "")
+ wildcards)
+ ;; Pop strings from PATTERN and concatenate them.
+ (while (stringp (car-safe pattern))
+ (setq fixed (concat fixed (pop pattern))))
+ ;; Pop wildcards from PATTERN.
+ (while (and pattern (symbolp (car-safe pattern)))
+ (push (pop pattern) wildcards))
+ ;; The sublist is a fixed string followed by all the wildcards.
+ (push (cons fixed (nreverse wildcards)) ret)))
+ (nreverse ret)))
+
+(defun completion-pcm--segments->regex (segments &optional group)
+ (concat "\\`"
+ (mapconcat
+ (lambda (segment)
+ (concat
+ (regexp-quote (car segment))
+ (when group "\\(")
+ (if (cl-every (lambda (x) (eq x 'any-delim)) (cdr segment))
+ (concat completion-pcm--delim-wild-regex "*?")
+ "[^z-a]*?")
+ (when group "\\)")))
+ segments
+ "")))
+
(defun completion-pcm--pattern->regex (pattern &optional group)
- (let ((re
- (concat "\\`"
- (mapconcat
- (lambda (x)
- (cond
- ((stringp x) (regexp-quote x))
- (t
- (let ((re (if (eq x 'any-delim)
- (concat completion-pcm--delim-wild-regex "*?")
- "[^z-a]*?")))
- (if (if (consp group) (memq x group) group)
- (concat "\\(" re "\\)")
- re)))))
- pattern
- ""))))
- ;; Avoid pathological backtracking.
- (while (string-match "\\.\\*\\?\\(?:\\\\[()]\\)*\\(\\.\\*\\?\\)" re)
- (setq re (replace-match "" t t re 1)))
- re))
-
-(defun completion-pcm--pattern-point-idx (pattern)
- "Return index of subgroup corresponding to `point' element of PATTERN.
-Return nil if there's no such element."
+ (completion-pcm--segments->regex (completion-pcm--pattern->segments pattern) group))
+
+(defun completion-pcm--segments-point-idx (segments)
+ "Return index of subgroup corresponding to `point' element of SEGMENTS.
+Return nil if there's no such element.
+This is used in combination with `completion-pcm--segments->regex'."
(let ((idx nil)
(i 0))
- (dolist (x pattern)
- (unless (stringp x)
- (incf i)
- (if (eq x 'point) (setq idx i))))
+ (dolist (x segments)
+ (incf i)
+ (when (memq 'point (cdr x))
+ (setq idx i)))
idx))
(defun completion-pcm--all-completions (prefix pattern table pred)
@@ -4551,8 +4572,9 @@ completion-pcm--hilit-commonality
completion-lazy-hilit-fn nil)
(cond
((and completions (cl-loop for e in pattern thereis (stringp e)))
- (let* ((re (completion-pcm--pattern->regex pattern 'group))
- (point-idx (completion-pcm--pattern-point-idx pattern)))
+ (let* ((segments (completion-pcm--pattern->segments pattern))
+ (re (completion-pcm--segments->regex segments 'group))
+ (point-idx (completion-pcm--segments-point-idx segments)))
(setq completion-pcm--regexp re)
(cond (completion-lazy-hilit
(setq completion-lazy-hilit-fn
@@ -4696,12 +4718,13 @@ completion-pcm--merge-completions
(cond
((null (cdr strs)) (list (car strs)))
(t
- (let ((re (completion-pcm--pattern->regex pattern 'group))
+ (let ((segmented (completion-pcm--pattern->segments (append pattern '(any))))
(ccs ())) ;Chopped completions.
;; First chop each string into the parts corresponding to each
;; non-constant element of `pattern', using regexp-matching.
- (let ((case-fold-search completion-ignore-case))
+ (let ((re (concat (completion-pcm--segments->regex segmented t) "\\'"))
+ (case-fold-search completion-ignore-case))
(dolist (str strs)
(unless (string-match re str)
(error "Internal error: %s doesn't match %s" str re))
@@ -4713,24 +4736,15 @@ completion-pcm--merge-completions
(push (substring str last next) chopped)
(setq last next)
(setq i (1+ i)))
- ;; Add the text corresponding to the implicit trailing `any'.
- (push (substring str last) chopped)
(push (nreverse chopped) ccs))))
;; Then for each of those non-constant elements, extract the
;; commonality between them.
- (let ((res ())
- (fixed "")
- ;; Accumulate each stretch of wildcards, and process them as a unit.
- (wildcards ()))
- ;; Make the implicit trailing `any' explicit.
- (dolist (elem (append pattern '(any)))
- (if (stringp elem)
- (progn
- (setq fixed (concat fixed elem))
- (setq wildcards nil))
+ (let ((res ()))
+ (dolist (elem segmented)
+ (let ((fixed (car elem))
+ (wildcards (cdr elem)))
(let ((comps ()))
- (push elem wildcards)
(dolist (cc (prog1 ccs (setq ccs nil)))
(push (car cc) comps)
(push (cdr cc) ccs))
@@ -4768,7 +4782,8 @@ completion-pcm--merge-completions
(push prefix res)
;; Push all the wildcards in this stretch, to preserve `point' and
;; `star' wildcards before ELEM.
- (setq res (append wildcards res))
+ (dolist (wildcard wildcards)
+ (push wildcard res))
;; Extract common suffix additionally to common prefix.
;; Don't do it for `any' since it could lead to a merged
;; completion that doesn't itself match the candidates.
@@ -4786,10 +4801,7 @@ completion-pcm--merge-completions
comps))))))
(cl-assert (stringp suffix))
(unless (equal suffix "")
- (push suffix res))))
- ;; We pushed these wildcards on RES, so we're done with them.
- (setq wildcards nil))
- (setq fixed "")))))
+ (push suffix res)))))))))
;; We return it in reverse order.
res)))))
diff --git a/test/lisp/minibuffer-tests.el b/test/lisp/minibuffer-tests.el
index c7b24a4928d..ca4f9f7ced5 100644
--- a/test/lisp/minibuffer-tests.el
+++ b/test/lisp/minibuffer-tests.el
@@ -268,7 +268,29 @@ completion-pcm-test-5
(should (null
(completion--pcm-first-difference-pos
(car (completion-pcm-all-completions
- "f" '("few" "many") nil 0))))))
+ "f" '("few" "many") nil 0)))))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*" '("ab" "ac") nil 2)))
+ 1))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*" '("ab" "ac") nil 1)))
+ 1))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*x" '("abx" "acx") nil 2)))
+ 1))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*x" '("abxd" "acxe") nil 3)))
+ ;; FIXME: the highlighting should start at the 4th character
+ ;; rather than the third.
+ 1)))
(ert-deftest completion-pcm-test-6 ()
;; Wildcards and delimiters work
--
2.43.7
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 17:02:01 GMT) Full text and rfc822 format available.Message #17 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: dmitry <at> gutov.dev, juri <at> linkov.net, monnier <at> iro.umontreal.ca, 79693 <at> debbugs.gnu.org Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 13:01:19 -0400
Eli Zaretskii <eliz <at> gnu.org> writes: >> Cc: Juri Linkov <juri <at> linkov.net>, Stefan Monnier <monnier <at> iro.umontreal.ca>, >> Dmitry Gutov <dmitry <at> gutov.dev> >> Date: Fri, 24 Oct 2025 17:52:38 -0400 >> From: Spencer Baugh via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org> >> >> For example, before my patch, the following in the Emacs repo takes 4 >> seconds on my machine, and afterwards it takes 5 milliseconds. >> (progn >> (setq my-files (project-files (project-current))) >> (benchmark-run 1 (completion-pcm-all-completions "*/vc" my-files nil 0))) > > The performance gain in that case is indeed impressive, but how do we > verify we don't get regressions in some other cases? Rgexps are > tricky, and we don't seem to have enough tests for this. I'm not observing any regressions in my usage. I believe we are unlikely to have performance regressions because the regexps are getting strictly simpler. The only thing changing about them is the removal of "[^z-a]*?" instances where there were previously multiple of those in sequence. I think it's highly unlikely that removing "[^z-a]*?" from a regexp will make it run slower.
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 17:45:02 GMT) Full text and rfc822 format available.Message #20 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, juri <at> linkov.net, monnier <at> iro.umontreal.ca, 79693 <at> debbugs.gnu.org Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 19:43:58 +0200
> From: Spencer Baugh <sbaugh <at> janestreet.com> > Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, > juri <at> linkov.net > Date: Mon, 27 Oct 2025 13:01:19 -0400 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > The performance gain in that case is indeed impressive, but how do we > > verify we don't get regressions in some other cases? Rgexps are > > tricky, and we don't seem to have enough tests for this. > > I'm not observing any regressions in my usage. That's good to hear, but I'm quite sure there are many other usage patterns. > I believe we are unlikely to have performance regressions because the > regexps are getting strictly simpler. The only thing changing about > them is the removal of "[^z-a]*?" instances where there were previously > multiple of those in sequence. > > I think it's highly unlikely that removing "[^z-a]*?" from a regexp will > make it run slower. This kind of reasoning tends to fail us time and again. Is it possible to come up with some decent body of tests in various scenarios, to give us more confidence that we don't favor one use case at the expense of others?
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 18:34:02 GMT) Full text and rfc822 format available.Message #23 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Stefan Monnier <monnier <at> iro.umontreal.ca> To: Eli Zaretskii <eliz <at> gnu.org> Cc: Spencer Baugh <sbaugh <at> janestreet.com>, juri <at> linkov.net, 79693 <at> debbugs.gnu.org, dmitry <at> gutov.dev Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 14:33:03 -0400
>> I believe we are unlikely to have performance regressions because the
>> regexps are getting strictly simpler. The only thing changing about
>> them is the removal of "[^z-a]*?" instances where there were previously
>> multiple of those in sequence.
>>
>> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
>> make it run slower.
>
> This kind of reasoning tends to fail us time and again.
>
> Is it possible to come up with some decent body of tests in various
> scenarios, to give us more confidence that we don't favor one use case
> at the expense of others?
Not sure if you're worried about semantics or performance, but in terms
of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*".
I think it would take a *really* weird corner case to show otherwise
(like maybe some situation where the extra work performed for
"[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of
its working set and thus indirectly ends up giving Emacs more CPU
time?).
Stefan
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 18:44:02 GMT) Full text and rfc822 format available.Message #26 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: sbaugh <at> janestreet.com, juri <at> linkov.net, 79693 <at> debbugs.gnu.org, dmitry <at> gutov.dev Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 20:42:54 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca> > Cc: Spencer Baugh <sbaugh <at> janestreet.com>, dmitry <at> gutov.dev, > 79693 <at> debbugs.gnu.org, juri <at> linkov.net > Date: Mon, 27 Oct 2025 14:33:03 -0400 > > >> I believe we are unlikely to have performance regressions because the > >> regexps are getting strictly simpler. The only thing changing about > >> them is the removal of "[^z-a]*?" instances where there were previously > >> multiple of those in sequence. > >> > >> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will > >> make it run slower. > > > > This kind of reasoning tends to fail us time and again. > > > > Is it possible to come up with some decent body of tests in various > > scenarios, to give us more confidence that we don't favor one use case > > at the expense of others? > > Not sure if you're worried about semantics or performance, but in terms > of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*". > I think it would take a *really* weird corner case to show otherwise > (like maybe some situation where the extra work performed for > "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of > its working set and thus indirectly ends up giving Emacs more CPU > time?). But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"!
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 21:03:01 GMT) Full text and rfc822 format available.Message #29 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Stefan Monnier <monnier <at> iro.umontreal.ca> To: Eli Zaretskii <eliz <at> gnu.org> Cc: sbaugh <at> janestreet.com, juri <at> linkov.net, 79693 <at> debbugs.gnu.org, dmitry <at> gutov.dev Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 17:01:55 -0400
> But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"!
Indeed. I don't see anything that worries me performance-wise in there,
but of course any change is a risk (contrarily to the previous change
that Spencer had proposed in that area).
Stefan "in favor of Spencer's patch, in case there was any doubt"
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Mon, 27 Oct 2025 21:52:01 GMT) Full text and rfc822 format available.Message #32 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>, juri <at> linkov.net Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Mon, 27 Oct 2025 17:51:24 -0400
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes: >> From: Stefan Monnier <monnier <at> iro.umontreal.ca> >> Cc: Spencer Baugh <sbaugh <at> janestreet.com>, dmitry <at> gutov.dev, >> 79693 <at> debbugs.gnu.org, juri <at> linkov.net >> Date: Mon, 27 Oct 2025 14:33:03 -0400 >> >> >> I believe we are unlikely to have performance regressions because the >> >> regexps are getting strictly simpler. The only thing changing about >> >> them is the removal of "[^z-a]*?" instances where there were previously >> >> multiple of those in sequence. >> >> >> >> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will >> >> make it run slower. >> > >> > This kind of reasoning tends to fail us time and again. >> > >> > Is it possible to come up with some decent body of tests in various >> > scenarios, to give us more confidence that we don't favor one use case >> > at the expense of others? >> >> Not sure if you're worried about semantics or performance, but in terms >> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*". >> I think it would take a *really* weird corner case to show otherwise >> (like maybe some situation where the extra work performed for >> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of >> its working set and thus indirectly ends up giving Emacs more CPU >> time?). > > But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"! Okay, that's reasonable: it's not immediately obvious that that's the only thing the patch does, there could easily be bugs in the patch which make it do something else too. So, I have now also added some tests which verify that the generated regexes have the expected form. If you run these tests on an old version, you can see that the only effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?". I think that's enough to be confident it's right.
[0001-Optimize-PCM-regex-to-not-contain-adjacent-wildcards.patch (text/x-patch, inline)]
From d262787082eca14455b02a3868c70f9e11835cb8 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 24 Oct 2025 15:58:44 -0400
Subject: [PATCH] Optimize PCM regex to not contain adjacent wildcards
When multiple wildcards occur in a PCM pattern,
completion-pcm--pattern->regex previously would generate one
instance of [^z-a]* for each of those wildcards, even if the
wildcards were adjacent and could therefore be matched by a
single [^z-a]*. This can make regex matching performance much
worse. For example, with a minibuffer containing "*/" with
point at the start, completion-pcm-all-completions would take
several seconds to match in project-find-file in the Emacs repo.
Now, we run completion-pcm--pattern->segments on the pattern
first, which adds additional structure to the pattern, including
consolidating adjacent regexes into a single sublist. Then
completion-pcm--segments->regex generates a single regex
wildcard for each of those pattern wildcards. As a consequence,
we need to update the callers of completion-pcm--pattern->regex
which pass non-nil GROUP. This provides a substantial
performance improvement in various edge cases.
* lisp/minibuffer.el (completion-pcm--pattern->segments)
(completion-pcm--segments->regex): Add. (bug#79693)
(completion-pcm--pattern->regex)
(completion-pcm--merge-completions): Call pattern->segments and
segments->regex.
(completion-pcm--pattern-point-idx): Delete.
(completion-pcm--segments-point-idx): Add.
(completion-pcm--hilit-commonality): Call
completion-pcm--segments-point-idx to find the submatch
containing point.
* test/lisp/minibuffer-tests.el (completion-pcm-test-5): Add
more tests of highlighting the first difference.
(completion-pcm-test-pattern->regex): Add tests showing the
regex form of PCM patterns.
---
lisp/minibuffer.el | 112 +++++++++++++++++++---------------
test/lisp/minibuffer-tests.el | 38 +++++++++++-
2 files changed, 100 insertions(+), 50 deletions(-)
diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index 9a78f631e02..4e6ddd928f4 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -4291,36 +4291,59 @@ completion-pcm--optimize-pattern
(_ (push (pop p) n))))
(nreverse n)))
+(defun completion-pcm--pattern->segments (pattern)
+ "Segment PATTERN into more structured sublists.
+
+Returns a list of lists which when concatenated is semantically the same
+as PATTERN.
+
+The first element in each sublist is a (possibly empty) string. The
+remaining elements in the sublist are all wildcard symbols. If PATTERN
+ends with a wildcard, then each sublist is guaranteed to have at least
+one wildcard."
+ (let (ret)
+ (while pattern
+ (let ((fixed "")
+ wildcards)
+ ;; Pop strings from PATTERN and concatenate them.
+ (while (stringp (car-safe pattern))
+ (setq fixed (concat fixed (pop pattern))))
+ ;; Pop wildcards from PATTERN.
+ (while (and pattern (symbolp (car-safe pattern)))
+ (push (pop pattern) wildcards))
+ ;; The sublist is a fixed string followed by all the wildcards.
+ (push (cons fixed (nreverse wildcards)) ret)))
+ (nreverse ret)))
+
+(defun completion-pcm--segments->regex (segments &optional group)
+ (concat "\\`"
+ (mapconcat
+ (lambda (segment)
+ (concat
+ (regexp-quote (car segment))
+ (when (cdr segment)
+ (concat
+ (when group "\\(")
+ (if (cl-every (lambda (x) (eq x 'any-delim)) (cdr segment))
+ (concat completion-pcm--delim-wild-regex "*?")
+ "[^z-a]*?")
+ (when group "\\)")))))
+ segments
+ "")))
+
(defun completion-pcm--pattern->regex (pattern &optional group)
- (let ((re
- (concat "\\`"
- (mapconcat
- (lambda (x)
- (cond
- ((stringp x) (regexp-quote x))
- (t
- (let ((re (if (eq x 'any-delim)
- (concat completion-pcm--delim-wild-regex "*?")
- "[^z-a]*?")))
- (if (if (consp group) (memq x group) group)
- (concat "\\(" re "\\)")
- re)))))
- pattern
- ""))))
- ;; Avoid pathological backtracking.
- (while (string-match "\\.\\*\\?\\(?:\\\\[()]\\)*\\(\\.\\*\\?\\)" re)
- (setq re (replace-match "" t t re 1)))
- re))
-
-(defun completion-pcm--pattern-point-idx (pattern)
- "Return index of subgroup corresponding to `point' element of PATTERN.
-Return nil if there's no such element."
+ (completion-pcm--segments->regex (completion-pcm--pattern->segments pattern) group))
+
+(defun completion-pcm--segments-point-idx (segments)
+ "Return index of subgroup corresponding to `point' element of SEGMENTS.
+Return nil if there's no such element.
+This is used in combination with `completion-pcm--segments->regex'."
(let ((idx nil)
(i 0))
- (dolist (x pattern)
- (unless (stringp x)
- (incf i)
- (if (eq x 'point) (setq idx i))))
+ (dolist (x segments)
+ (incf i)
+ (when (memq 'point (cdr x))
+ (setq idx i)))
idx))
(defun completion-pcm--all-completions (prefix pattern table pred)
@@ -4551,8 +4574,9 @@ completion-pcm--hilit-commonality
completion-lazy-hilit-fn nil)
(cond
((and completions (cl-loop for e in pattern thereis (stringp e)))
- (let* ((re (completion-pcm--pattern->regex pattern 'group))
- (point-idx (completion-pcm--pattern-point-idx pattern)))
+ (let* ((segments (completion-pcm--pattern->segments pattern))
+ (re (completion-pcm--segments->regex segments 'group))
+ (point-idx (completion-pcm--segments-point-idx segments)))
(setq completion-pcm--regexp re)
(cond (completion-lazy-hilit
(setq completion-lazy-hilit-fn
@@ -4696,12 +4720,13 @@ completion-pcm--merge-completions
(cond
((null (cdr strs)) (list (car strs)))
(t
- (let ((re (completion-pcm--pattern->regex pattern 'group))
+ (let ((segmented (completion-pcm--pattern->segments (append pattern '(any))))
(ccs ())) ;Chopped completions.
;; First chop each string into the parts corresponding to each
;; non-constant element of `pattern', using regexp-matching.
- (let ((case-fold-search completion-ignore-case))
+ (let ((re (concat (completion-pcm--segments->regex segmented t) "\\'"))
+ (case-fold-search completion-ignore-case))
(dolist (str strs)
(unless (string-match re str)
(error "Internal error: %s doesn't match %s" str re))
@@ -4713,24 +4738,15 @@ completion-pcm--merge-completions
(push (substring str last next) chopped)
(setq last next)
(setq i (1+ i)))
- ;; Add the text corresponding to the implicit trailing `any'.
- (push (substring str last) chopped)
(push (nreverse chopped) ccs))))
;; Then for each of those non-constant elements, extract the
;; commonality between them.
- (let ((res ())
- (fixed "")
- ;; Accumulate each stretch of wildcards, and process them as a unit.
- (wildcards ()))
- ;; Make the implicit trailing `any' explicit.
- (dolist (elem (append pattern '(any)))
- (if (stringp elem)
- (progn
- (setq fixed (concat fixed elem))
- (setq wildcards nil))
+ (let ((res ()))
+ (dolist (elem segmented)
+ (let ((fixed (car elem))
+ (wildcards (cdr elem)))
(let ((comps ()))
- (push elem wildcards)
(dolist (cc (prog1 ccs (setq ccs nil)))
(push (car cc) comps)
(push (cdr cc) ccs))
@@ -4768,7 +4784,8 @@ completion-pcm--merge-completions
(push prefix res)
;; Push all the wildcards in this stretch, to preserve `point' and
;; `star' wildcards before ELEM.
- (setq res (append wildcards res))
+ (dolist (wildcard wildcards)
+ (push wildcard res))
;; Extract common suffix additionally to common prefix.
;; Don't do it for `any' since it could lead to a merged
;; completion that doesn't itself match the candidates.
@@ -4786,10 +4803,7 @@ completion-pcm--merge-completions
comps))))))
(cl-assert (stringp suffix))
(unless (equal suffix "")
- (push suffix res))))
- ;; We pushed these wildcards on RES, so we're done with them.
- (setq wildcards nil))
- (setq fixed "")))))
+ (push suffix res)))))))))
;; We return it in reverse order.
res)))))
diff --git a/test/lisp/minibuffer-tests.el b/test/lisp/minibuffer-tests.el
index c7b24a4928d..8114899c0ff 100644
--- a/test/lisp/minibuffer-tests.el
+++ b/test/lisp/minibuffer-tests.el
@@ -268,7 +268,29 @@ completion-pcm-test-5
(should (null
(completion--pcm-first-difference-pos
(car (completion-pcm-all-completions
- "f" '("few" "many") nil 0))))))
+ "f" '("few" "many") nil 0)))))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*" '("ab" "ac") nil 2)))
+ 1))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*" '("ab" "ac") nil 1)))
+ 1))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*x" '("abx" "acx") nil 2)))
+ 1))
+ (should (equal
+ (completion--pcm-first-difference-pos
+ (car (completion-pcm-all-completions
+ "a*x" '("abxd" "acxe") nil 3)))
+ ;; FIXME: the highlighting should start at the 4th character
+ ;; rather than the third.
+ 3)))
(ert-deftest completion-pcm-test-6 ()
;; Wildcards and delimiters work
@@ -339,6 +361,20 @@ completion-pcm-test-anydelim
"-x" '("-_.x" "-__x") nil 2)
'("-_x" . 3))))
+(ert-deftest completion-pcm-test-pattern->regex ()
+ (should (equal (completion-pcm--pattern->regex
+ '("A" any prefix "B" point "C"))
+ "\\`A[^z-a]*?B[^z-a]*?C"))
+ (should (equal (completion-pcm--pattern->regex
+ '(any any-delim prefix "A" "B" "C"))
+ "\\`[^z-a]*?ABC"))
+ (should (equal (completion-pcm--pattern->regex
+ '(any-delim "A" "B" star "C"))
+ "\\`[-_./:| *]*?AB[^z-a]*?C"))
+ (should (equal (completion-pcm--pattern->regex
+ '(any "A" any-delim "B" any-delim "C" any))
+ "\\`[^z-a]*?A[-_./:| *]*?B[-_./:| *]*?C[^z-a]*?")))
+
(ert-deftest completion-pcm-bug4219 ()
;; With `completion-ignore-case', try-completion should change the
;; case of existing text when the completions have different casing.
--
2.43.7
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Tue, 28 Oct 2025 12:58:02 GMT) Full text and rfc822 format available.Message #35 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, juri <at> linkov.net Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Tue, 28 Oct 2025 14:56:45 +0200
> From: Spencer Baugh <sbaugh <at> janestreet.com> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, juri <at> linkov.net, > 79693 <at> debbugs.gnu.org, dmitry <at> gutov.dev > Date: Mon, 27 Oct 2025 17:51:24 -0400 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > >> From: Stefan Monnier <monnier <at> iro.umontreal.ca> > >> Cc: Spencer Baugh <sbaugh <at> janestreet.com>, dmitry <at> gutov.dev, > >> 79693 <at> debbugs.gnu.org, juri <at> linkov.net > >> Date: Mon, 27 Oct 2025 14:33:03 -0400 > >> > >> >> I believe we are unlikely to have performance regressions because the > >> >> regexps are getting strictly simpler. The only thing changing about > >> >> them is the removal of "[^z-a]*?" instances where there were previously > >> >> multiple of those in sequence. > >> >> > >> >> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will > >> >> make it run slower. > >> > > >> > This kind of reasoning tends to fail us time and again. > >> > > >> > Is it possible to come up with some decent body of tests in various > >> > scenarios, to give us more confidence that we don't favor one use case > >> > at the expense of others? > >> > >> Not sure if you're worried about semantics or performance, but in terms > >> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*". > >> I think it would take a *really* weird corner case to show otherwise > >> (like maybe some situation where the extra work performed for > >> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of > >> its working set and thus indirectly ends up giving Emacs more CPU > >> time?). > > > > But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"! > > Okay, that's reasonable: it's not immediately obvious that that's the > only thing the patch does, there could easily be bugs in the patch which > make it do something else too. > > So, I have now also added some tests which verify that the generated > regexes have the expected form. > > If you run these tests on an old version, you can see that the only > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?". > > I think that's enough to be confident it's right. Thanks. Would it make sense to leave the code alone, and just add some simple replacement code at the end which replaces each instance of "[^z-a]*[^z-a]*" with "[^z-a]*"? That'd be even safer, IMO.
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Tue, 28 Oct 2025 13:00:02 GMT) Full text and rfc822 format available.Message #38 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79693 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net> Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Tue, 28 Oct 2025 08:59:07 -0400
[Message part 1 (text/plain, inline)]
On Tue, Oct 28, 2025, 8:56 AM Eli Zaretskii <eliz <at> gnu.org> wrote: > > From: Spencer Baugh <sbaugh <at> janestreet.com> > > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, juri <at> linkov.net, > > 79693 <at> debbugs.gnu.org, dmitry <at> gutov.dev > > Date: Mon, 27 Oct 2025 17:51:24 -0400 > > > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > > >> From: Stefan Monnier <monnier <at> iro.umontreal.ca> > > >> Cc: Spencer Baugh <sbaugh <at> janestreet.com>, dmitry <at> gutov.dev, > > >> 79693 <at> debbugs.gnu.org, juri <at> linkov.net > > >> Date: Mon, 27 Oct 2025 14:33:03 -0400 > > >> > > >> >> I believe we are unlikely to have performance regressions because > the > > >> >> regexps are getting strictly simpler. The only thing changing > about > > >> >> them is the removal of "[^z-a]*?" instances where there were > previously > > >> >> multiple of those in sequence. > > >> >> > > >> >> I think it's highly unlikely that removing "[^z-a]*?" from a > regexp will > > >> >> make it run slower. > > >> > > > >> > This kind of reasoning tends to fail us time and again. > > >> > > > >> > Is it possible to come up with some decent body of tests in various > > >> > scenarios, to give us more confidence that we don't favor one use > case > > >> > at the expense of others? > > >> > > >> Not sure if you're worried about semantics or performance, but in > terms > > >> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*". > > >> I think it would take a *really* weird corner case to show otherwise > > >> (like maybe some situation where the extra work performed for > > >> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out > of > > >> its working set and thus indirectly ends up giving Emacs more CPU > > >> time?). > > > > > > But the patch does much more that replace "[^z-a]*[^z-a]*" with > "[^z-a]*"! > > > > Okay, that's reasonable: it's not immediately obvious that that's the > > only thing the patch does, there could easily be bugs in the patch which > > make it do something else too. > > > > So, I have now also added some tests which verify that the generated > > regexes have the expected form. > > > > If you run these tests on an old version, you can see that the only > > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or > > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?". > > > > I think that's enough to be confident it's right. > > Thanks. Would it make sense to leave the code alone, and just add > some simple replacement code at the end which replaces each instance > of "[^z-a]*[^z-a]*" with "[^z-a]*"? That'd be even safer, IMO. > Unfortunately that's not possible because it changes the structure of the capture groups, and that's what requires the other changes. >
[Message part 2 (text/html, inline)]
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Tue, 28 Oct 2025 13:19:02 GMT) Full text and rfc822 format available.Message #41 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, juri <at> linkov.net Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Tue, 28 Oct 2025 15:18:23 +0200
> From: Spencer Baugh <sbaugh <at> janestreet.com> > Date: Tue, 28 Oct 2025 08:59:07 -0400 > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net>, 79693 <at> debbugs.gnu.org, > Dmitry Gutov <dmitry <at> gutov.dev> > > > If you run these tests on an old version, you can see that the only > > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or > > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?". > > > > I think that's enough to be confident it's right. > > Thanks. Would it make sense to leave the code alone, and just add > some simple replacement code at the end which replaces each instance > of "[^z-a]*[^z-a]*" with "[^z-a]*"? That'd be even safer, IMO. > > Unfortunately that's not possible because it changes the structure of the capture groups, and that's what > requires the other changes. Too bad. I guess we will have to keep our fingers crossed, as there's nothing better.
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Wed, 29 Oct 2025 20:33:02 GMT) Full text and rfc822 format available.Message #44 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79693 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net> Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Wed, 29 Oct 2025 16:32:31 -0400
[Message part 1 (text/plain, inline)]
So then we're good to install this change, right? Could someone please do that for me? On Tue, Oct 28, 2025, 9:18 AM Eli Zaretskii <eliz <at> gnu.org> wrote: > > From: Spencer Baugh <sbaugh <at> janestreet.com> > > Date: Tue, 28 Oct 2025 08:59:07 -0400 > > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov < > juri <at> linkov.net>, 79693 <at> debbugs.gnu.org, > > Dmitry Gutov <dmitry <at> gutov.dev> > > > > > If you run these tests on an old version, you can see that the only > > > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", > or > > > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?". > > > > > > I think that's enough to be confident it's right. > > > > Thanks. Would it make sense to leave the code alone, and just add > > some simple replacement code at the end which replaces each instance > > of "[^z-a]*[^z-a]*" with "[^z-a]*"? That'd be even safer, IMO. > > > > Unfortunately that's not possible because it changes the structure of > the capture groups, and that's what > > requires the other changes. > > Too bad. I guess we will have to keep our fingers crossed, as there's > nothing better. >
[Message part 2 (text/html, inline)]
bug-gnu-emacs <at> gnu.org:bug#79693; Package emacs.
(Thu, 30 Oct 2025 17:29:01 GMT) Full text and rfc822 format available.Message #47 received at 79693 <at> debbugs.gnu.org (full text, mbox):
From: Juri Linkov <juri <at> linkov.net> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: 79693 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dmitry <at> gutov.dev> Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent wildcards Date: Thu, 30 Oct 2025 19:25:21 +0200
> So then we're good to install this change, right? Could someone please do
> that for me?
I tested this change and found no problems, so now it's pushed.
I wonder is it ok to close this report given this note in the patch:
;; FIXME: the highlighting should start at the 4th character
;; rather than the third.
But I don't see where is the problem:
(completion-pcm-all-completions
"a*x" '("abxd" "acxe") nil 3)
=>
(#("abxd"
0 1 (face completions-common-part)
2 3 (face completions-common-part)
3 4 (face completions-first-difference))
#("acxe"
0 1 (face completions-common-part)
2 3 (face completions-common-part)
3 4 (face completions-first-difference))
. 0)
Here highlighting with 'completions-first-difference'
starts at the 4th character.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.