GNU bug report logs - #79693
[PATCH] Optimize PCM regex to not contain adjacent wildcards

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Fri, 24 Oct 2025 21:53:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 79693 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to juri <at> linkov.net, monnier <at> iro.umontreal.ca, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Fri, 24 Oct 2025 21:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Spencer Baugh <sbaugh <at> janestreet.com>:
New bug report received and forwarded. Copy sent to juri <at> linkov.net, monnier <at> iro.umontreal.ca, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org. (Fri, 24 Oct 2025 21:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Optimize PCM regex to not contain adjacent wildcards
Date: Fri, 24 Oct 2025 17:52:38 -0400

[Message part 1 (text/plain, inline)]

Tags: patch


For example, before my patch, the following in the Emacs repo takes 4
seconds on my machine, and afterwards it takes 5 milliseconds.
(progn
  (setq my-files (project-files (project-current)))
  (benchmark-run 1 (completion-pcm-all-completions "*/vc" my-files nil 0)))

In GNU Emacs 30.1.90 (build 65, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2025-10-23 built on
 igm-qws-u22796a
Repository revision: 36f12b133cdf7966901a2a28c96ff358658f545a
Repository branch: emacs-30
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.10 (Green Obsidian)

Configured using:
 'configure --with-x-toolkit=lucid --without-gpm --without-gconf
 --without-selinux --without-imagemagick --with-modules --with-gif=no
 --with-cairo --with-rsvg --without-compress-install --with-tree-sitter
 --with-native-compilation=aot
 PKG_CONFIG_PATH=/usr/local/home/garnish/libtree-sitter/0.22.6-1/lib/pkgconfig/'

[0001-Optimize-PCM-regex-to-not-contain-adjacent-wildcards.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Sat, 25 Oct 2025 08:26:02 GMT) Full text and rfc822 format available.

Message #8 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 juri <at> linkov.net
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Sat, 25 Oct 2025 11:25:24 +0300

> Cc: Juri Linkov <juri <at> linkov.net>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
>  Dmitry Gutov <dmitry <at> gutov.dev>
> Date: Fri, 24 Oct 2025 17:52:38 -0400
> From:  Spencer Baugh via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> For example, before my patch, the following in the Emacs repo takes 4
> seconds on my machine, and afterwards it takes 5 milliseconds.
> (progn
>   (setq my-files (project-files (project-current)))
>   (benchmark-run 1 (completion-pcm-all-completions "*/vc" my-files nil 0)))

The performance gain in that case is indeed impressive, but how do we
verify we don't get regressions in some other cases?  Rgexps are
tricky, and we don't seem to have enough tests for this.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 01:59:02 GMT) Full text and rfc822 format available.

Message #11 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79693 <at> debbugs.gnu.org,
 Juri Linkov <juri <at> linkov.net>
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Sun, 26 Oct 2025 21:58:15 -0400

> @@ -4291,36 +4291,64 @@ completion-pcm--optimize-pattern
>          (_ (push (pop p) n))))
>      (nreverse n)))
>  
> +(defun completion-pcm--pattern->segments (pattern)
> +  "Segment PATTERN into more structured sublists.
> +
> +Returns a list of lists which when concatenated is semantically the same
> +as PATTERN.
> +
> +The first element in each sublist is a (possibly empty) string.  The
> +remaining elements in the sublist are all wildcard symbols.  If PATTERN
> +ends with a wildcard, then each sublist is guaranteed to have at least
> +one wildcard.  Each sublist either contains no `all-delim' wildcards, or
> +only `all-delim' wildcards; this is useful since `all-delim' wildcards
> +need to be matched by a different regex than other wildcards."

`all-delim` => `any-delim`

> +  (let (ret)
> +    (while pattern
> +      (let ((fixed "")
> +            wildcards)
> +        ;; Pop strings from PATTERN and concatenate them.
> +        (while (stringp (car-safe pattern))
> +          (setq fixed (concat fixed (pop pattern))))
> +        ;; Pop wildcards of the same type from PATTERN.
> +        (if (eq (car-safe pattern) 'any-delim)
> +            (while (eq (car-safe pattern) 'any-delim)
> +              (push (pop pattern) wildcards))
> +          (while (and pattern
> +                      (symbolp (car-safe pattern))
> +                      (not (eq (car-safe pattern) 'any-delim)))
> +            (push (pop pattern) wildcards)))
> +        ;; The sublist is a fixed string followed by all the wildcards.
> +        (push (cons fixed (nreverse wildcards)) ret)))
> +    (nreverse ret)))

IIUC with a pattern like ("FOO" any-delim any), we generate a result of
the form

    (("FOO" any-delim)
     ("" any))

Wouldn't it make more sense to simplify it to

    (("FOO" any-delim any))

which can turn into a regexp like "FOO[z-a]*"?

> +(defun completion-pcm--segments->regex (segments &optional group)
> +  (concat "\\`"
> +          (mapconcat
> +           (lambda (segment)
> +             (concat
> +              (regexp-quote (car segment))
> +              (when group "\\(")
> +              (if (eq (cadr segment) 'any-delim)
> +                  (concat completion-pcm--delim-wild-regex "*?")
> +                "[^z-a]*?")
> +              (when group "\\)")))
> +           segments
> +           "")))

With the above, I guess the idea here would be to check (all (lambda (x)
(eq x 'any-delim)) (cdr segment)) instead of (eq (cadr segment)
'any-delim).

I guess you thought of that but it has problems, in which case please
add a comment in the code explaining the problem.

> +(defun completion-pcm--segments-point-idx (segments)
> +  "Return index of subgroup corresponding to `point' element of SEGMENTS.
> +Return nil if there's no such element.
> +This is used in combination with `completion-pcm--segments->regex'."
>    (let ((idx nil)
>          (i 0))
> -    (dolist (x pattern)
> -      (unless (stringp x)
> -        (incf i)
> -        (if (eq x 'point) (setq idx i))))
> +    (dolist (x segments)
> +      (incf i)
> +      (when (memq 'point (cdr x))
> +        (setq idx i)))
>      idx))

Hmm... so IIUC we don't distinguish segments of the form (STR point star)
from segments of the form (STR star point), right?
I'm wondering whether it's OK, or whether on the contrary we should be
careful about it because it affects the final position of point.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 16:59:02 GMT) Full text and rfc822 format available.

Message #14 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, Juri Linkov <juri <at> linkov.net>,
 79693 <at> debbugs.gnu.org
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 12:57:58 -0400

[Message part 1 (text/plain, inline)]

Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> @@ -4291,36 +4291,64 @@ completion-pcm--optimize-pattern
>>          (_ (push (pop p) n))))
>>      (nreverse n)))
>>  
>> +(defun completion-pcm--pattern->segments (pattern)
>> +  "Segment PATTERN into more structured sublists.
>> +
>> +Returns a list of lists which when concatenated is semantically the same
>> +as PATTERN.
>> +
>> +The first element in each sublist is a (possibly empty) string.  The
>> +remaining elements in the sublist are all wildcard symbols.  If PATTERN
>> +ends with a wildcard, then each sublist is guaranteed to have at least
>> +one wildcard.  Each sublist either contains no `all-delim' wildcards, or
>> +only `all-delim' wildcards; this is useful since `all-delim' wildcards
>> +need to be matched by a different regex than other wildcards."
>
> `all-delim` => `any-delim`

(Text removed in the next version)

>> +  (let (ret)
>> +    (while pattern
>> +      (let ((fixed "")
>> +            wildcards)
>> +        ;; Pop strings from PATTERN and concatenate them.
>> +        (while (stringp (car-safe pattern))
>> +          (setq fixed (concat fixed (pop pattern))))
>> +        ;; Pop wildcards of the same type from PATTERN.
>> +        (if (eq (car-safe pattern) 'any-delim)
>> +            (while (eq (car-safe pattern) 'any-delim)
>> +              (push (pop pattern) wildcards))
>> +          (while (and pattern
>> +                      (symbolp (car-safe pattern))
>> +                      (not (eq (car-safe pattern) 'any-delim)))
>> +            (push (pop pattern) wildcards)))
>> +        ;; The sublist is a fixed string followed by all the wildcards.
>> +        (push (cons fixed (nreverse wildcards)) ret)))
>> +    (nreverse ret)))
>
> IIUC with a pattern like ("FOO" any-delim any), we generate a result of
> the form
>
>     (("FOO" any-delim)
>      ("" any))
>
> Wouldn't it make more sense to simplify it to
>
>     (("FOO" any-delim any))
>
> which can turn into a regexp like "FOO[z-a]*"?

Yes, good catch!  That's both faster to run and simpler to code.  Done
in the attached patch.

>> +(defun completion-pcm--segments->regex (segments &optional group)
>> +  (concat "\\`"
>> +          (mapconcat
>> +           (lambda (segment)
>> +             (concat
>> +              (regexp-quote (car segment))
>> +              (when group "\\(")
>> +              (if (eq (cadr segment) 'any-delim)
>> +                  (concat completion-pcm--delim-wild-regex "*?")
>> +                "[^z-a]*?")
>> +              (when group "\\)")))
>> +           segments
>> +           "")))
>
> With the above, I guess the idea here would be to check (all (lambda (x)
> (eq x 'any-delim)) (cdr segment)) instead of (eq (cadr segment)
> 'any-delim).
>
> I guess you thought of that but it has problems, in which case please
> add a comment in the code explaining the problem.

Nope, I just didn't think of it. :)

>> +(defun completion-pcm--segments-point-idx (segments)
>> +  "Return index of subgroup corresponding to `point' element of SEGMENTS.
>> +Return nil if there's no such element.
>> +This is used in combination with `completion-pcm--segments->regex'."
>>    (let ((idx nil)
>>          (i 0))
>> -    (dolist (x pattern)
>> -      (unless (stringp x)
>> -        (incf i)
>> -        (if (eq x 'point) (setq idx i))))
>> +    (dolist (x segments)
>> +      (incf i)
>> +      (when (memq 'point (cdr x))
>> +        (setq idx i)))
>>      idx))
>
> Hmm... so IIUC we don't distinguish segments of the form (STR point star)
> from segments of the form (STR star point), right?
> I'm wondering whether it's OK, or whether on the contrary we should be
> careful about it because it affects the final position of point.

True, we do need to be careful about it.  In fact, this code was already
buggy before my change: if there was a * before point, we'd highlight
the wrong characters as the first difference.

But since this bug was pre-existing, probably I should fix it in a
separate patch.  So for now I've just added a test showing the bug.
(Since it only affects highlighting, it's not that critical)

[0001-Optimize-PCM-regex-to-not-contain-adjacent-wildcards.patch (text/x-patch, inline)]

From bc9e8f8d715862ca799602f015c42d25b98d9d86 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 24 Oct 2025 15:58:44 -0400
Subject: [PATCH] Optimize PCM regex to not contain adjacent wildcards

When multiple wildcards occur in a PCM pattern,
completion-pcm--pattern->regex previously would generate one
instance of [^z-a]* for each of those wildcards, even if the
wildcards were adjacent and could therefore be matched by a
single [^z-a]*.  This can make regex matching performance much
worse.  For example, with a minibuffer containing "*/" with
point at the start, completion-pcm-all-completions would take
several seconds to match in project-find-file in the Emacs repo.

Now, we run completion-pcm--pattern->segments on the pattern
first, which adds additional structure to the pattern, including
consolidating adjacent regexes into a single sublist.  Then
completion-pcm--segments->regex generates a single regex
wildcard for each of those pattern wildcards.  As a consequence,
we need to update the callers of completion-pcm--pattern->regex
which pass non-nil GROUP.  This provides a substantial
performance improvement in various edge cases.

* lisp/minibuffer.el (completion-pcm--pattern->segments)
(completion-pcm--segments->regex): Add.  (bug#79693)
(completion-pcm--pattern->regex)
(completion-pcm--merge-completions): Call pattern->segments and
segments->regex.
(completion-pcm--pattern-point-idx): Delete.
(completion-pcm--segments-point-idx): Add.
(completion-pcm--hilit-commonality): Call
completion-pcm--segments-point-idx to find the submatch
containing point.
* test/lisp/minibuffer-tests.el (completion-pcm-test-5): Add
more tests of highlighting the first difference.
---
 lisp/minibuffer.el            | 110 +++++++++++++++++++---------------
 test/lisp/minibuffer-tests.el |  24 +++++++-
 2 files changed, 84 insertions(+), 50 deletions(-)

diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index 9a78f631e02..678e4b1f104 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -4291,36 +4291,57 @@ completion-pcm--optimize-pattern
         (_ (push (pop p) n))))
     (nreverse n)))
 
+(defun completion-pcm--pattern->segments (pattern)
+  "Segment PATTERN into more structured sublists.
+
+Returns a list of lists which when concatenated is semantically the same
+as PATTERN.
+
+The first element in each sublist is a (possibly empty) string.  The
+remaining elements in the sublist are all wildcard symbols.  If PATTERN
+ends with a wildcard, then each sublist is guaranteed to have at least
+one wildcard."
+  (let (ret)
+    (while pattern
+      (let ((fixed "")
+            wildcards)
+        ;; Pop strings from PATTERN and concatenate them.
+        (while (stringp (car-safe pattern))
+          (setq fixed (concat fixed (pop pattern))))
+        ;; Pop wildcards from PATTERN.
+        (while (and pattern (symbolp (car-safe pattern)))
+          (push (pop pattern) wildcards))
+        ;; The sublist is a fixed string followed by all the wildcards.
+        (push (cons fixed (nreverse wildcards)) ret)))
+    (nreverse ret)))
+
+(defun completion-pcm--segments->regex (segments &optional group)
+  (concat "\\`"
+          (mapconcat
+           (lambda (segment)
+             (concat
+              (regexp-quote (car segment))
+              (when group "\\(")
+              (if (cl-every (lambda (x) (eq x 'any-delim)) (cdr segment))
+                  (concat completion-pcm--delim-wild-regex "*?")
+                "[^z-a]*?")
+              (when group "\\)")))
+           segments
+           "")))
+
 (defun completion-pcm--pattern->regex (pattern &optional group)
-  (let ((re
-         (concat "\\`"
-                 (mapconcat
-                  (lambda (x)
-                    (cond
-                     ((stringp x) (regexp-quote x))
-                     (t
-                      (let ((re (if (eq x 'any-delim)
-                                    (concat completion-pcm--delim-wild-regex "*?")
-                                  "[^z-a]*?")))
-                        (if (if (consp group) (memq x group) group)
-                            (concat "\\(" re "\\)")
-                          re)))))
-                  pattern
-                  ""))))
-    ;; Avoid pathological backtracking.
-    (while (string-match "\\.\\*\\?\\(?:\\\\[()]\\)*\\(\\.\\*\\?\\)" re)
-      (setq re (replace-match "" t t re 1)))
-    re))
-
-(defun completion-pcm--pattern-point-idx (pattern)
-  "Return index of subgroup corresponding to `point' element of PATTERN.
-Return nil if there's no such element."
+  (completion-pcm--segments->regex (completion-pcm--pattern->segments pattern) group))
+
+(defun completion-pcm--segments-point-idx (segments)
+  "Return index of subgroup corresponding to `point' element of SEGMENTS.
+Return nil if there's no such element.
+This is used in combination with `completion-pcm--segments->regex'."
   (let ((idx nil)
         (i 0))
-    (dolist (x pattern)
-      (unless (stringp x)
-        (incf i)
-        (if (eq x 'point) (setq idx i))))
+    (dolist (x segments)
+      (incf i)
+      (when (memq 'point (cdr x))
+        (setq idx i)))
     idx))
 
 (defun completion-pcm--all-completions (prefix pattern table pred)
@@ -4551,8 +4572,9 @@ completion-pcm--hilit-commonality
         completion-lazy-hilit-fn nil)
   (cond
    ((and completions (cl-loop for e in pattern thereis (stringp e)))
-    (let* ((re (completion-pcm--pattern->regex pattern 'group))
-           (point-idx (completion-pcm--pattern-point-idx pattern)))
+    (let* ((segments (completion-pcm--pattern->segments pattern))
+           (re (completion-pcm--segments->regex segments 'group))
+           (point-idx (completion-pcm--segments-point-idx segments)))
       (setq completion-pcm--regexp re)
       (cond (completion-lazy-hilit
              (setq completion-lazy-hilit-fn
@@ -4696,12 +4718,13 @@ completion-pcm--merge-completions
   (cond
    ((null (cdr strs)) (list (car strs)))
    (t
-    (let ((re (completion-pcm--pattern->regex pattern 'group))
+    (let ((segmented (completion-pcm--pattern->segments (append pattern '(any))))
           (ccs ()))                     ;Chopped completions.
 
       ;; First chop each string into the parts corresponding to each
       ;; non-constant element of `pattern', using regexp-matching.
-      (let ((case-fold-search completion-ignore-case))
+      (let ((re (concat (completion-pcm--segments->regex segmented t) "\\'"))
+            (case-fold-search completion-ignore-case))
         (dolist (str strs)
           (unless (string-match re str)
             (error "Internal error: %s doesn't match %s" str re))
@@ -4713,24 +4736,15 @@ completion-pcm--merge-completions
               (push (substring str last next) chopped)
               (setq last next)
               (setq i (1+ i)))
-            ;; Add the text corresponding to the implicit trailing `any'.
-            (push (substring str last) chopped)
             (push (nreverse chopped) ccs))))
 
       ;; Then for each of those non-constant elements, extract the
       ;; commonality between them.
-      (let ((res ())
-            (fixed "")
-            ;; Accumulate each stretch of wildcards, and process them as a unit.
-            (wildcards ()))
-        ;; Make the implicit trailing `any' explicit.
-        (dolist (elem (append pattern '(any)))
-          (if (stringp elem)
-              (progn
-                (setq fixed (concat fixed elem))
-                (setq wildcards nil))
+      (let ((res ()))
+        (dolist (elem segmented)
+          (let ((fixed (car elem))
+                (wildcards (cdr elem)))
             (let ((comps ()))
-              (push elem wildcards)
               (dolist (cc (prog1 ccs (setq ccs nil)))
                 (push (car cc) comps)
                 (push (cdr cc) ccs))
@@ -4768,7 +4782,8 @@ completion-pcm--merge-completions
                   (push prefix res)
                   ;; Push all the wildcards in this stretch, to preserve `point' and
                   ;; `star' wildcards before ELEM.
-                  (setq res (append wildcards res))
+                  (dolist (wildcard wildcards)
+                    (push wildcard res))
                   ;; Extract common suffix additionally to common prefix.
                   ;; Don't do it for `any' since it could lead to a merged
                   ;; completion that doesn't itself match the candidates.
@@ -4786,10 +4801,7 @@ completion-pcm--merge-completions
                                         comps))))))
                       (cl-assert (stringp suffix))
                       (unless (equal suffix "")
-                        (push suffix res))))
-                  ;; We pushed these wildcards on RES, so we're done with them.
-                  (setq wildcards nil))
-                (setq fixed "")))))
+                        (push suffix res)))))))))
         ;; We return it in reverse order.
         res)))))
 
diff --git a/test/lisp/minibuffer-tests.el b/test/lisp/minibuffer-tests.el
index c7b24a4928d..ca4f9f7ced5 100644
--- a/test/lisp/minibuffer-tests.el
+++ b/test/lisp/minibuffer-tests.el
@@ -268,7 +268,29 @@ completion-pcm-test-5
   (should (null
            (completion--pcm-first-difference-pos
             (car (completion-pcm-all-completions
-                  "f" '("few" "many") nil 0))))))
+                  "f" '("few" "many") nil 0)))))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*" '("ab" "ac") nil 2)))
+           1))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*" '("ab" "ac") nil 1)))
+           1))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*x" '("abx" "acx") nil 2)))
+           1))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*x" '("abxd" "acxe") nil 3)))
+           ;; FIXME: the highlighting should start at the 4th character
+           ;; rather than the third.
+           1)))
 
 (ert-deftest completion-pcm-test-6 ()
   ;; Wildcards and delimiters work
-- 
2.43.7

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 17:02:01 GMT) Full text and rfc822 format available.

Message #17 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, juri <at> linkov.net, monnier <at> iro.umontreal.ca,
 79693 <at> debbugs.gnu.org
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 13:01:19 -0400

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Cc: Juri Linkov <juri <at> linkov.net>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
>>  Dmitry Gutov <dmitry <at> gutov.dev>
>> Date: Fri, 24 Oct 2025 17:52:38 -0400
>> From:  Spencer Baugh via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>> 
>> For example, before my patch, the following in the Emacs repo takes 4
>> seconds on my machine, and afterwards it takes 5 milliseconds.
>> (progn
>>   (setq my-files (project-files (project-current)))
>>   (benchmark-run 1 (completion-pcm-all-completions "*/vc" my-files nil 0)))
>
> The performance gain in that case is indeed impressive, but how do we
> verify we don't get regressions in some other cases?  Rgexps are
> tricky, and we don't seem to have enough tests for this.

I'm not observing any regressions in my usage.

I believe we are unlikely to have performance regressions because the
regexps are getting strictly simpler.  The only thing changing about
them is the removal of "[^z-a]*?" instances where there were previously
multiple of those in sequence.

I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
make it run slower.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 17:45:02 GMT) Full text and rfc822 format available.

Message #20 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, juri <at> linkov.net, monnier <at> iro.umontreal.ca,
 79693 <at> debbugs.gnu.org
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 19:43:58 +0200

> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: dmitry <at> gutov.dev,  79693 <at> debbugs.gnu.org,  monnier <at> iro.umontreal.ca,
>    juri <at> linkov.net
> Date: Mon, 27 Oct 2025 13:01:19 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > The performance gain in that case is indeed impressive, but how do we
> > verify we don't get regressions in some other cases?  Rgexps are
> > tricky, and we don't seem to have enough tests for this.
> 
> I'm not observing any regressions in my usage.

That's good to hear, but I'm quite sure there are many other usage
patterns.

> I believe we are unlikely to have performance regressions because the
> regexps are getting strictly simpler.  The only thing changing about
> them is the removal of "[^z-a]*?" instances where there were previously
> multiple of those in sequence.
> 
> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
> make it run slower.

This kind of reasoning tends to fail us time and again.

Is it possible to come up with some decent body of tests in various
scenarios, to give us more confidence that we don't favor one use case
at the expense of others?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 18:34:02 GMT) Full text and rfc822 format available.

Message #23 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, juri <at> linkov.net,
 79693 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 14:33:03 -0400

>> I believe we are unlikely to have performance regressions because the
>> regexps are getting strictly simpler.  The only thing changing about
>> them is the removal of "[^z-a]*?" instances where there were previously
>> multiple of those in sequence.
>> 
>> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
>> make it run slower.
>
> This kind of reasoning tends to fail us time and again.
>
> Is it possible to come up with some decent body of tests in various
> scenarios, to give us more confidence that we don't favor one use case
> at the expense of others?

Not sure if you're worried about semantics or performance, but in terms
of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*".
I think it would take a *really* weird corner case to show otherwise
(like maybe some situation where the extra work performed for
"[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of
its working set and thus indirectly ends up giving Emacs more CPU
time?).


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 18:44:02 GMT) Full text and rfc822 format available.

Message #26 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: sbaugh <at> janestreet.com, juri <at> linkov.net, 79693 <at> debbugs.gnu.org,
 dmitry <at> gutov.dev
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 20:42:54 +0200

> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Spencer Baugh <sbaugh <at> janestreet.com>,  dmitry <at> gutov.dev,
>   79693 <at> debbugs.gnu.org,  juri <at> linkov.net
> Date: Mon, 27 Oct 2025 14:33:03 -0400
> 
> >> I believe we are unlikely to have performance regressions because the
> >> regexps are getting strictly simpler.  The only thing changing about
> >> them is the removal of "[^z-a]*?" instances where there were previously
> >> multiple of those in sequence.
> >> 
> >> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
> >> make it run slower.
> >
> > This kind of reasoning tends to fail us time and again.
> >
> > Is it possible to come up with some decent body of tests in various
> > scenarios, to give us more confidence that we don't favor one use case
> > at the expense of others?
> 
> Not sure if you're worried about semantics or performance, but in terms
> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*".
> I think it would take a *really* weird corner case to show otherwise
> (like maybe some situation where the extra work performed for
> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of
> its working set and thus indirectly ends up giving Emacs more CPU
> time?).

But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"!

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 21:03:01 GMT) Full text and rfc822 format available.

Message #29 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, juri <at> linkov.net, 79693 <at> debbugs.gnu.org,
 dmitry <at> gutov.dev
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 17:01:55 -0400

> But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"!

Indeed.  I don't see anything that worries me performance-wise in there,
but of course any change is a risk (contrarily to the previous change
that Spencer had proposed in that area).


        Stefan "in favor of Spencer's patch, in case there was any doubt"

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Mon, 27 Oct 2025 21:52:01 GMT) Full text and rfc822 format available.

Message #32 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, juri <at> linkov.net
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Mon, 27 Oct 2025 17:51:24 -0400

[Message part 1 (text/plain, inline)]

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
>> Cc: Spencer Baugh <sbaugh <at> janestreet.com>,  dmitry <at> gutov.dev,
>>   79693 <at> debbugs.gnu.org,  juri <at> linkov.net
>> Date: Mon, 27 Oct 2025 14:33:03 -0400
>> 
>> >> I believe we are unlikely to have performance regressions because the
>> >> regexps are getting strictly simpler.  The only thing changing about
>> >> them is the removal of "[^z-a]*?" instances where there were previously
>> >> multiple of those in sequence.
>> >> 
>> >> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
>> >> make it run slower.
>> >
>> > This kind of reasoning tends to fail us time and again.
>> >
>> > Is it possible to come up with some decent body of tests in various
>> > scenarios, to give us more confidence that we don't favor one use case
>> > at the expense of others?
>> 
>> Not sure if you're worried about semantics or performance, but in terms
>> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*".
>> I think it would take a *really* weird corner case to show otherwise
>> (like maybe some situation where the extra work performed for
>> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of
>> its working set and thus indirectly ends up giving Emacs more CPU
>> time?).
>
> But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"!

Okay, that's reasonable: it's not immediately obvious that that's the
only thing the patch does, there could easily be bugs in the patch which
make it do something else too.

So, I have now also added some tests which verify that the generated
regexes have the expected form.

If you run these tests on an old version, you can see that the only
effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or
"[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?".

I think that's enough to be confident it's right.

[0001-Optimize-PCM-regex-to-not-contain-adjacent-wildcards.patch (text/x-patch, inline)]

From d262787082eca14455b02a3868c70f9e11835cb8 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 24 Oct 2025 15:58:44 -0400
Subject: [PATCH] Optimize PCM regex to not contain adjacent wildcards

When multiple wildcards occur in a PCM pattern,
completion-pcm--pattern->regex previously would generate one
instance of [^z-a]* for each of those wildcards, even if the
wildcards were adjacent and could therefore be matched by a
single [^z-a]*.  This can make regex matching performance much
worse.  For example, with a minibuffer containing "*/" with
point at the start, completion-pcm-all-completions would take
several seconds to match in project-find-file in the Emacs repo.

Now, we run completion-pcm--pattern->segments on the pattern
first, which adds additional structure to the pattern, including
consolidating adjacent regexes into a single sublist.  Then
completion-pcm--segments->regex generates a single regex
wildcard for each of those pattern wildcards.  As a consequence,
we need to update the callers of completion-pcm--pattern->regex
which pass non-nil GROUP.  This provides a substantial
performance improvement in various edge cases.

* lisp/minibuffer.el (completion-pcm--pattern->segments)
(completion-pcm--segments->regex): Add.  (bug#79693)
(completion-pcm--pattern->regex)
(completion-pcm--merge-completions): Call pattern->segments and
segments->regex.
(completion-pcm--pattern-point-idx): Delete.
(completion-pcm--segments-point-idx): Add.
(completion-pcm--hilit-commonality): Call
completion-pcm--segments-point-idx to find the submatch
containing point.
* test/lisp/minibuffer-tests.el (completion-pcm-test-5): Add
more tests of highlighting the first difference.
(completion-pcm-test-pattern->regex): Add tests showing the
regex form of PCM patterns.
---
 lisp/minibuffer.el            | 112 +++++++++++++++++++---------------
 test/lisp/minibuffer-tests.el |  38 +++++++++++-
 2 files changed, 100 insertions(+), 50 deletions(-)

diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index 9a78f631e02..4e6ddd928f4 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -4291,36 +4291,59 @@ completion-pcm--optimize-pattern
         (_ (push (pop p) n))))
     (nreverse n)))
 
+(defun completion-pcm--pattern->segments (pattern)
+  "Segment PATTERN into more structured sublists.
+
+Returns a list of lists which when concatenated is semantically the same
+as PATTERN.
+
+The first element in each sublist is a (possibly empty) string.  The
+remaining elements in the sublist are all wildcard symbols.  If PATTERN
+ends with a wildcard, then each sublist is guaranteed to have at least
+one wildcard."
+  (let (ret)
+    (while pattern
+      (let ((fixed "")
+            wildcards)
+        ;; Pop strings from PATTERN and concatenate them.
+        (while (stringp (car-safe pattern))
+          (setq fixed (concat fixed (pop pattern))))
+        ;; Pop wildcards from PATTERN.
+        (while (and pattern (symbolp (car-safe pattern)))
+          (push (pop pattern) wildcards))
+        ;; The sublist is a fixed string followed by all the wildcards.
+        (push (cons fixed (nreverse wildcards)) ret)))
+    (nreverse ret)))
+
+(defun completion-pcm--segments->regex (segments &optional group)
+  (concat "\\`"
+          (mapconcat
+           (lambda (segment)
+             (concat
+              (regexp-quote (car segment))
+              (when (cdr segment)
+                (concat
+                 (when group "\\(")
+                 (if (cl-every (lambda (x) (eq x 'any-delim)) (cdr segment))
+                     (concat completion-pcm--delim-wild-regex "*?")
+                   "[^z-a]*?")
+                 (when group "\\)")))))
+           segments
+           "")))
+
 (defun completion-pcm--pattern->regex (pattern &optional group)
-  (let ((re
-         (concat "\\`"
-                 (mapconcat
-                  (lambda (x)
-                    (cond
-                     ((stringp x) (regexp-quote x))
-                     (t
-                      (let ((re (if (eq x 'any-delim)
-                                    (concat completion-pcm--delim-wild-regex "*?")
-                                  "[^z-a]*?")))
-                        (if (if (consp group) (memq x group) group)
-                            (concat "\\(" re "\\)")
-                          re)))))
-                  pattern
-                  ""))))
-    ;; Avoid pathological backtracking.
-    (while (string-match "\\.\\*\\?\\(?:\\\\[()]\\)*\\(\\.\\*\\?\\)" re)
-      (setq re (replace-match "" t t re 1)))
-    re))
-
-(defun completion-pcm--pattern-point-idx (pattern)
-  "Return index of subgroup corresponding to `point' element of PATTERN.
-Return nil if there's no such element."
+  (completion-pcm--segments->regex (completion-pcm--pattern->segments pattern) group))
+
+(defun completion-pcm--segments-point-idx (segments)
+  "Return index of subgroup corresponding to `point' element of SEGMENTS.
+Return nil if there's no such element.
+This is used in combination with `completion-pcm--segments->regex'."
   (let ((idx nil)
         (i 0))
-    (dolist (x pattern)
-      (unless (stringp x)
-        (incf i)
-        (if (eq x 'point) (setq idx i))))
+    (dolist (x segments)
+      (incf i)
+      (when (memq 'point (cdr x))
+        (setq idx i)))
     idx))
 
 (defun completion-pcm--all-completions (prefix pattern table pred)
@@ -4551,8 +4574,9 @@ completion-pcm--hilit-commonality
         completion-lazy-hilit-fn nil)
   (cond
    ((and completions (cl-loop for e in pattern thereis (stringp e)))
-    (let* ((re (completion-pcm--pattern->regex pattern 'group))
-           (point-idx (completion-pcm--pattern-point-idx pattern)))
+    (let* ((segments (completion-pcm--pattern->segments pattern))
+           (re (completion-pcm--segments->regex segments 'group))
+           (point-idx (completion-pcm--segments-point-idx segments)))
       (setq completion-pcm--regexp re)
       (cond (completion-lazy-hilit
              (setq completion-lazy-hilit-fn
@@ -4696,12 +4720,13 @@ completion-pcm--merge-completions
   (cond
    ((null (cdr strs)) (list (car strs)))
    (t
-    (let ((re (completion-pcm--pattern->regex pattern 'group))
+    (let ((segmented (completion-pcm--pattern->segments (append pattern '(any))))
           (ccs ()))                     ;Chopped completions.
 
       ;; First chop each string into the parts corresponding to each
       ;; non-constant element of `pattern', using regexp-matching.
-      (let ((case-fold-search completion-ignore-case))
+      (let ((re (concat (completion-pcm--segments->regex segmented t) "\\'"))
+            (case-fold-search completion-ignore-case))
         (dolist (str strs)
           (unless (string-match re str)
             (error "Internal error: %s doesn't match %s" str re))
@@ -4713,24 +4738,15 @@ completion-pcm--merge-completions
               (push (substring str last next) chopped)
               (setq last next)
               (setq i (1+ i)))
-            ;; Add the text corresponding to the implicit trailing `any'.
-            (push (substring str last) chopped)
             (push (nreverse chopped) ccs))))
 
       ;; Then for each of those non-constant elements, extract the
       ;; commonality between them.
-      (let ((res ())
-            (fixed "")
-            ;; Accumulate each stretch of wildcards, and process them as a unit.
-            (wildcards ()))
-        ;; Make the implicit trailing `any' explicit.
-        (dolist (elem (append pattern '(any)))
-          (if (stringp elem)
-              (progn
-                (setq fixed (concat fixed elem))
-                (setq wildcards nil))
+      (let ((res ()))
+        (dolist (elem segmented)
+          (let ((fixed (car elem))
+                (wildcards (cdr elem)))
             (let ((comps ()))
-              (push elem wildcards)
               (dolist (cc (prog1 ccs (setq ccs nil)))
                 (push (car cc) comps)
                 (push (cdr cc) ccs))
@@ -4768,7 +4784,8 @@ completion-pcm--merge-completions
                   (push prefix res)
                   ;; Push all the wildcards in this stretch, to preserve `point' and
                   ;; `star' wildcards before ELEM.
-                  (setq res (append wildcards res))
+                  (dolist (wildcard wildcards)
+                    (push wildcard res))
                   ;; Extract common suffix additionally to common prefix.
                   ;; Don't do it for `any' since it could lead to a merged
                   ;; completion that doesn't itself match the candidates.
@@ -4786,10 +4803,7 @@ completion-pcm--merge-completions
                                         comps))))))
                       (cl-assert (stringp suffix))
                       (unless (equal suffix "")
-                        (push suffix res))))
-                  ;; We pushed these wildcards on RES, so we're done with them.
-                  (setq wildcards nil))
-                (setq fixed "")))))
+                        (push suffix res)))))))))
         ;; We return it in reverse order.
         res)))))
 
diff --git a/test/lisp/minibuffer-tests.el b/test/lisp/minibuffer-tests.el
index c7b24a4928d..8114899c0ff 100644
--- a/test/lisp/minibuffer-tests.el
+++ b/test/lisp/minibuffer-tests.el
@@ -268,7 +268,29 @@ completion-pcm-test-5
   (should (null
            (completion--pcm-first-difference-pos
             (car (completion-pcm-all-completions
-                  "f" '("few" "many") nil 0))))))
+                  "f" '("few" "many") nil 0)))))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*" '("ab" "ac") nil 2)))
+           1))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*" '("ab" "ac") nil 1)))
+           1))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*x" '("abx" "acx") nil 2)))
+           1))
+  (should (equal
+           (completion--pcm-first-difference-pos
+            (car (completion-pcm-all-completions
+                  "a*x" '("abxd" "acxe") nil 3)))
+           ;; FIXME: the highlighting should start at the 4th character
+           ;; rather than the third.
+           3)))
 
 (ert-deftest completion-pcm-test-6 ()
   ;; Wildcards and delimiters work
@@ -339,6 +361,20 @@ completion-pcm-test-anydelim
                   "-x" '("-_.x" "-__x") nil 2)
                  '("-_x" . 3))))
 
+(ert-deftest completion-pcm-test-pattern->regex ()
+  (should (equal (completion-pcm--pattern->regex
+                  '("A" any prefix "B" point "C"))
+                 "\\`A[^z-a]*?B[^z-a]*?C"))
+  (should (equal (completion-pcm--pattern->regex
+                  '(any any-delim prefix "A" "B" "C"))
+                 "\\`[^z-a]*?ABC"))
+  (should (equal (completion-pcm--pattern->regex
+                  '(any-delim "A" "B" star "C"))
+                 "\\`[-_./:| *]*?AB[^z-a]*?C"))
+  (should (equal (completion-pcm--pattern->regex
+                  '(any "A" any-delim "B" any-delim "C" any))
+                 "\\`[^z-a]*?A[-_./:| *]*?B[-_./:| *]*?C[^z-a]*?")))
+
 (ert-deftest completion-pcm-bug4219 ()
   ;; With `completion-ignore-case', try-completion should change the
   ;; case of existing text when the completions have different casing.
-- 
2.43.7

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Tue, 28 Oct 2025 12:58:02 GMT) Full text and rfc822 format available.

Message #35 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 juri <at> linkov.net
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Tue, 28 Oct 2025 14:56:45 +0200

> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>,  juri <at> linkov.net,
>    79693 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
> Date: Mon, 27 Oct 2025 17:51:24 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> >> Cc: Spencer Baugh <sbaugh <at> janestreet.com>,  dmitry <at> gutov.dev,
> >>   79693 <at> debbugs.gnu.org,  juri <at> linkov.net
> >> Date: Mon, 27 Oct 2025 14:33:03 -0400
> >> 
> >> >> I believe we are unlikely to have performance regressions because the
> >> >> regexps are getting strictly simpler.  The only thing changing about
> >> >> them is the removal of "[^z-a]*?" instances where there were previously
> >> >> multiple of those in sequence.
> >> >> 
> >> >> I think it's highly unlikely that removing "[^z-a]*?" from a regexp will
> >> >> make it run slower.
> >> >
> >> > This kind of reasoning tends to fail us time and again.
> >> >
> >> > Is it possible to come up with some decent body of tests in various
> >> > scenarios, to give us more confidence that we don't favor one use case
> >> > at the expense of others?
> >> 
> >> Not sure if you're worried about semantics or performance, but in terms
> >> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*".
> >> I think it would take a *really* weird corner case to show otherwise
> >> (like maybe some situation where the extra work performed for
> >> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out of
> >> its working set and thus indirectly ends up giving Emacs more CPU
> >> time?).
> >
> > But the patch does much more that replace "[^z-a]*[^z-a]*" with "[^z-a]*"!
> 
> Okay, that's reasonable: it's not immediately obvious that that's the
> only thing the patch does, there could easily be bugs in the patch which
> make it do something else too.
> 
> So, I have now also added some tests which verify that the generated
> regexes have the expected form.
> 
> If you run these tests on an old version, you can see that the only
> effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or
> "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?".
> 
> I think that's enough to be confident it's right.

Thanks.  Would it make sense to leave the code alone, and just add
some simple replacement code at the end which replaces each instance
of "[^z-a]*[^z-a]*" with "[^z-a]*"?  That'd be even safer, IMO.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Tue, 28 Oct 2025 13:00:02 GMT) Full text and rfc822 format available.

Message #38 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79693 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net>
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Tue, 28 Oct 2025 08:59:07 -0400

[Message part 1 (text/plain, inline)]

On Tue, Oct 28, 2025, 8:56 AM Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Spencer Baugh <sbaugh <at> janestreet.com>
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>,  juri <at> linkov.net,
> >    79693 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
> > Date: Mon, 27 Oct 2025 17:51:24 -0400
> >
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> > >> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> > >> Cc: Spencer Baugh <sbaugh <at> janestreet.com>,  dmitry <at> gutov.dev,
> > >>   79693 <at> debbugs.gnu.org,  juri <at> linkov.net
> > >> Date: Mon, 27 Oct 2025 14:33:03 -0400
> > >>
> > >> >> I believe we are unlikely to have performance regressions because
> the
> > >> >> regexps are getting strictly simpler.  The only thing changing
> about
> > >> >> them is the removal of "[^z-a]*?" instances where there were
> previously
> > >> >> multiple of those in sequence.
> > >> >>
> > >> >> I think it's highly unlikely that removing "[^z-a]*?" from a
> regexp will
> > >> >> make it run slower.
> > >> >
> > >> > This kind of reasoning tends to fail us time and again.
> > >> >
> > >> > Is it possible to come up with some decent body of tests in various
> > >> > scenarios, to give us more confidence that we don't favor one use
> case
> > >> > at the expense of others?
> > >>
> > >> Not sure if you're worried about semantics or performance, but in
> terms
> > >> of performance, "[^z-a]*" will always be faster than "[^z-a]*[^z-a]*".
> > >> I think it would take a *really* weird corner case to show otherwise
> > >> (like maybe some situation where the extra work performed for
> > >> "[^z-a]*[^z-a]*" causes extra paging which pushes some other task out
> of
> > >> its working set and thus indirectly ends up giving Emacs more CPU
> > >> time?).
> > >
> > > But the patch does much more that replace "[^z-a]*[^z-a]*" with
> "[^z-a]*"!
> >
> > Okay, that's reasonable: it's not immediately obvious that that's the
> > only thing the patch does, there could easily be bugs in the patch which
> > make it do something else too.
> >
> > So, I have now also added some tests which verify that the generated
> > regexes have the expected form.
> >
> > If you run these tests on an old version, you can see that the only
> > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or
> > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?".
> >
> > I think that's enough to be confident it's right.
>
> Thanks.  Would it make sense to leave the code alone, and just add
> some simple replacement code at the end which replaces each instance
> of "[^z-a]*[^z-a]*" with "[^z-a]*"?  That'd be even safer, IMO.
>

Unfortunately that's not possible because it changes the structure of the
capture groups, and that's what requires the other changes.

>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Tue, 28 Oct 2025 13:19:02 GMT) Full text and rfc822 format available.

Message #41 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79693 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 juri <at> linkov.net
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Tue, 28 Oct 2025 15:18:23 +0200

> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Date: Tue, 28 Oct 2025 08:59:07 -0400
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net>, 79693 <at> debbugs.gnu.org, 
>  	Dmitry Gutov <dmitry <at> gutov.dev>
> 
>  > If you run these tests on an old version, you can see that the only
>  > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?", or
>  > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?".
>  > 
>  > I think that's enough to be confident it's right.
> 
>  Thanks.  Would it make sense to leave the code alone, and just add
>  some simple replacement code at the end which replaces each instance
>  of "[^z-a]*[^z-a]*" with "[^z-a]*"?  That'd be even safer, IMO.
> 
> Unfortunately that's not possible because it changes the structure of the capture groups, and that's what
> requires the other changes.

Too bad.  I guess we will have to keep our fingers crossed, as there's
nothing better.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Wed, 29 Oct 2025 20:33:02 GMT) Full text and rfc822 format available.

Message #44 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79693 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net>
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Wed, 29 Oct 2025 16:32:31 -0400

[Message part 1 (text/plain, inline)]

So then we're good to install this change, right?  Could someone please do
that for me?

On Tue, Oct 28, 2025, 9:18 AM Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Spencer Baugh <sbaugh <at> janestreet.com>
> > Date: Tue, 28 Oct 2025 08:59:07 -0400
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <
> juri <at> linkov.net>, 79693 <at> debbugs.gnu.org,
> >       Dmitry Gutov <dmitry <at> gutov.dev>
> >
> >  > If you run these tests on an old version, you can see that the only
> >  > effect of my change is to replace "[^z-a]*?[^z-a]*?" with "[^z-a]*?",
> or
> >  > "[^z-a]*?A[-_./:| *]*?" with "[^z-a]*?".
> >  >
> >  > I think that's enough to be confident it's right.
> >
> >  Thanks.  Would it make sense to leave the code alone, and just add
> >  some simple replacement code at the end which replaces each instance
> >  of "[^z-a]*[^z-a]*" with "[^z-a]*"?  That'd be even safer, IMO.
> >
> > Unfortunately that's not possible because it changes the structure of
> the capture groups, and that's what
> > requires the other changes.
>
> Too bad.  I guess we will have to keep our fingers crossed, as there's
> nothing better.
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79693; Package emacs. (Thu, 30 Oct 2025 17:29:01 GMT) Full text and rfc822 format available.

Message #47 received at 79693 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79693 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dmitry <at> gutov.dev>
Subject: Re: bug#79693: [PATCH] Optimize PCM regex to not contain adjacent
 wildcards
Date: Thu, 30 Oct 2025 19:25:21 +0200

> So then we're good to install this change, right?  Could someone please do
> that for me?

I tested this change and found no problems, so now it's pushed.

I wonder is it ok to close this report given this note in the patch:

  ;; FIXME: the highlighting should start at the 4th character
  ;; rather than the third.

But I don't see where is the problem:

  (completion-pcm-all-completions
   "a*x" '("abxd" "acxe") nil 3)
  =>
  (#("abxd"
     0 1 (face completions-common-part)
     2 3 (face completions-common-part)
     3 4 (face completions-first-difference))
   #("acxe"
     0 1 (face completions-common-part)
     2 3 (face completions-common-part)
     3 4 (face completions-first-difference))
   . 0)

Here highlighting with 'completions-first-difference'
starts at the 4th character.

This bug report was last modified 59 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79693 [PATCH] Optimize PCM regex to not contain adjacent wildcards

GNU bug report logs - #79693
[PATCH] Optimize PCM regex to not contain adjacent wildcards