GNU bug report logs -
#66806
30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
Previous Next
Reported by: Jim Porter <jporterbugs <at> gmail.com>
Date: Sun, 29 Oct 2023 05:37:02 UTC
Severity: normal
Tags: patch
Found in version 30.0.50
Fixed in version 30.1
Done: Dmitry Gutov <dmitry <at> gutov.dev>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 66806 in the body.
You can then email your comments to 66806 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 05:37:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jim Porter <jporterbugs <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org
.
(Sun, 29 Oct 2023 05:37:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
X-Debbugs-Cc: dmitry <at> gutov.dev
To see this in action, you can do the following, starting from "emacs
-Q" inside of a Git repo that contains submodules:
M-x trace-function RET project-files RET
C-x p g some-string RET
If you look at the trace, you'll see that the files in your submodules
are returned from 'project-files', but so is the submodule directory.
That's not really correct, since 'project-files' is supposed to return
*files*, not directories. (There are already workarounds for this in
'project-search' and 'project-query-replace-regexp'.)
By default, this is just a theoretical problem, but if you customize
'xref-search-program-alist' and 'xref-search-program' to include some
other program, this can cause real issues. For example, I tried to add
"ag" to this list[1], and unfortunately, it just doesn't work in this
case. The results for submodules get duplicated, and there's no way I
can see with ag to search only the specified *files*, ignoring any
directories. (Looking at the definition for ripgrep, I'm guessing the
"-g '!*/'" is the trick for that program, but nothing similar works for ag.)
Attached are two possible patches for this: a minimal version that just
fixes 'project-find-regexp', and a maximal version that fixes this in
general, and should theoretically speed up 'project-search' and
'project-query-replace-regexp' since they no longer need to call
'file-regular-p' on every file.
Do either of these patches look ok?
[1] This is a long story, simplified for this message since the gory
details aren't especially relevant.
[minimal_0001-Exclude-Git-submodules-when-getting-list-of-files-fo.patch (text/plain, attachment)]
[maximal_0001-Exclude-Git-submodules-from-project-files.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 06:08:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 66806 <at> debbugs.gnu.org (full text, mbox):
> Cc: dmitry <at> gutov.dev
> Date: Sat, 28 Oct 2023 22:36:07 -0700
> From: Jim Porter <jporterbugs <at> gmail.com>
>
> --- a/lisp/progmodes/project.el
> +++ b/lisp/progmodes/project.el
> @@ -960,7 +960,8 @@ project-find-regexp
> (default-directory (project-root pr))
> (files
> (if (not current-prefix-arg)
> - (project-files pr)
> + ;; XXX: See the comment in project-query-replace-regexp.
> + (cl-delete-if-not #'file-regular-p (project-files pr))
^^^^^^^^^^^^^^^^
I think we want to prefer using seq.el functions, since seq.el is
nowadays preloaded. Is there a good reason to use cl-delete-if-not
here?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 17:56:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 66806 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 10/28/2023 11:06 PM, Eli Zaretskii wrote:
>> Cc: dmitry <at> gutov.dev
>> Date: Sat, 28 Oct 2023 22:36:07 -0700
>> From: Jim Porter <jporterbugs <at> gmail.com>
>>
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -960,7 +960,8 @@ project-find-regexp
>> (default-directory (project-root pr))
>> (files
>> (if (not current-prefix-arg)
>> - (project-files pr)
>> + ;; XXX: See the comment in project-query-replace-regexp.
>> + (cl-delete-if-not #'file-regular-p (project-files pr))
> ^^^^^^^^^^^^^^^^
> I think we want to prefer using seq.el functions, since seq.el is
> nowadays preloaded. Is there a good reason to use cl-delete-if-not
> here?
Well, that's just copy-pasted from some other functions in project.el.
If we want to go the minimal route, I could update all those workarounds.
Or we could go the maximal route and fix it at its source. Here's an
updated patch for the maximal route that uses 'seq-difference'.
Assuming we're ok with the performance characteristics of the maximal
patch, I think the maximal route is best: it fixes the issue at its
source. For performance, it should be faster by default, but a bit
slower when 'project-vc-merge-submodules' is nil (since we need an extra
call to "git" to get the list of submodules). However, that slowness is
compensated for by eliminating the need to call 'file-regular-p' on all
the results for some functions that really do need 'project-files' to
return only files. (If we're really concerned about *exact* perf
numbers, I can try to collect some. Just let me know.)
[maximal_0001-Exclude-Git-submodules-from-project-files.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 19:24:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 66806 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 29 Oct 2023 10:54:28 -0700
> Cc: dmitry <at> gutov.dev, 66806 <at> debbugs.gnu.org
> From: Jim Porter <jporterbugs <at> gmail.com>
>
> >> - (project-files pr)
> >> + ;; XXX: See the comment in project-query-replace-regexp.
> >> + (cl-delete-if-not #'file-regular-p (project-files pr))
> > ^^^^^^^^^^^^^^^^
> > I think we want to prefer using seq.el functions, since seq.el is
> > nowadays preloaded. Is there a good reason to use cl-delete-if-not
> > here?
>
> Well, that's just copy-pasted from some other functions in project.el.
> If we want to go the minimal route, I could update all those workarounds.
>
> Or we could go the maximal route and fix it at its source. Here's an
> updated patch for the maximal route that uses 'seq-difference'.
That's up to Dmitry; from my POV it is enough that new code prefers
seq.el.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 21:43:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 66806 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Jim,
On 29/10/2023 07:36, Jim Porter wrote:
> By default, this is just a theoretical problem, but if you customize
> 'xref-search-program-alist' and 'xref-search-program' to include some
> other program, this can cause real issues. For example, I tried to add
> "ag" to this list[1], and unfortunately, it just doesn't work in this
> case. The results for submodules get duplicated, and there's no way I
> can see with ag to search only the specified *files*, ignoring any
> directories. (Looking at the definition for ripgrep, I'm guessing the
> "-g '!*/'" is the trick for that program, but nothing similar works for
> ag.)
>
> Attached are two possible patches for this: a minimal version that just
> fixes 'project-find-regexp', and a maximal version that fixes this in
> general, and should theoretically speed up 'project-search' and
> 'project-query-replace-regexp' since they no longer need to call
> 'file-regular-p' on every file.
I kept this unfortunate situation around because every obvious fix
brought non-negligible performance regressions: the version with
file-regular-p slowed one of my examples (Mozilla's repo) by 370%. Your
cl-set-difference version slowed it down by 10-14% -- better, but still
something that seemed worse on balance when I tried this before, so I
preferred to work around it in command implementations: the "-s" or
"--no-messages" flags xref-search-program-alist.
And that's not to mention usage over Tramp (which would be affected by
the +1 process call that you mentioned as well, but that seems unavoidable).
Anyway, after recent experience micro-optimizing list operations, I came
up with this version where the impact seems minimal.
WDYT?
[project-no-submodules.diff (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 22:04:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 66806 <at> debbugs.gnu.org (full text, mbox):
On 29/10/2023 07:36, Jim Porter wrote:
> (Looking at the definition for ripgrep, I'm guessing the "-g '!*/'" is
> the trick for that program, but nothing similar works for ag.)
Actually, it's a failed solution which I copied from the rg's issue
tracker but I didn't test it enough or it broke in some later versions,
and then I put off dealing with it when finally noticed. Maybe it never
worked and only could filter out directories when those are passed with
a trailing slash.
What should work, is adding the argument "--max-depth 0" (stopping the
traversal of passed directories). For ag, the corresponding fix seems to
be "--depth 0", but I haven't tested that in practice.
Anyway, see the other email, which should make it all unnecessary
(unless you have a 3rd party package which has to work with older
project.el too).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Sun, 29 Oct 2023 22:16:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 66806 <at> debbugs.gnu.org (full text, mbox):
On 29/10/2023 08:06, Eli Zaretskii wrote:
>> Cc:dmitry <at> gutov.dev
>> Date: Sat, 28 Oct 2023 22:36:07 -0700
>> From: Jim Porter<jporterbugs <at> gmail.com>
>>
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -960,7 +960,8 @@ project-find-regexp
>> (default-directory (project-root pr))
>> (files
>> (if (not current-prefix-arg)
>> - (project-files pr)
>> + ;; XXX: See the comment in project-query-replace-regexp.
>> + (cl-delete-if-not #'file-regular-p (project-files pr))
> ^^^^^^^^^^^^^^^^
> I think we want to prefer using seq.el functions, since seq.el is
> nowadays preloaded. Is there a good reason to use cl-delete-if-not
> here?
I'm okay with using seq with other things equal, but in this case both
cl- approaches are too slow, and seq-difference is no better because of
consing and indirection overhead (about 10-20% slower than current). I'd
say it's a shortcoming of seq.el: only having non-destructive versions.
I had some hope for cl-nset-difference, but looking at the
implementation it just delegates to the non-destructive cousin (I guess
providing the optimized implementation was left as TODO).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Mon, 30 Oct 2023 00:27:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 66806 <at> debbugs.gnu.org (full text, mbox):
On 10/29/2023 3:02 PM, Dmitry Gutov wrote:
> What should work, is adding the argument "--max-depth 0" (stopping the
> traversal of passed directories). For ag, the corresponding fix seems to
> be "--depth 0", but I haven't tested that in practice.
I tried that, and sadly it didn't work. That could be a bug in ag though.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Mon, 30 Oct 2023 00:59:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 66806 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
> And that's not to mention usage over Tramp (which would be affected by
> the +1 process call that you mentioned as well, but that seems
> unavoidable).
Yeah, I don't see a way around that, unless we constructed a complex
command to run via Tramp that does it all in one go.
(I looked into using the "--stage" argument for "git ls-files", which
gets most of the way to fixing this, but you could break that logic with
evil file names not in the tree...)
> Anyway, after recent experience micro-optimizing list operations, I came
> up with this version where the impact seems minimal.
>
> WDYT?
Thanks, that helped form the basis for the attached patch. It moves the
'member' call into the lambda for the original 'mapcar', and then we can
just 'delq' all the nil elements. Benchmarking against the Emacs repo,
this still seems to have a minimal perf impact.
There might also be a benefit to using "git ls-files
--recurse-submodules" when we can (i.e. when not using "-o"), but maybe
we can leave that optimization for another day.
[0001-Exclude-Git-submodules-from-project-files.patch (text/plain, attachment)]
Reply sent
to
Dmitry Gutov <dmitry <at> gutov.dev>
:
You have taken responsibility.
(Mon, 30 Oct 2023 02:02:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jim Porter <jporterbugs <at> gmail.com>
:
bug acknowledged by developer.
(Mon, 30 Oct 2023 02:02:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 66806-done <at> debbugs.gnu.org (full text, mbox):
Version: 30.1
On 30/10/2023 02:58, Jim Porter wrote:
> On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
>> And that's not to mention usage over Tramp (which would be affected by
>> the +1 process call that you mentioned as well, but that seems
>> unavoidable).
>
> Yeah, I don't see a way around that, unless we constructed a complex
> command to run via Tramp that does it all in one go.
>
> (I looked into using the "--stage" argument for "git ls-files", which
> gets most of the way to fixing this, but you could break that logic with
> evil file names not in the tree...)
Filtering out gitlinks based on the mode bits? Clever.
Would probably slow down the parsing of the output, though. Not sure by
how much.
>> Anyway, after recent experience micro-optimizing list operations, I
>> came up with this version where the impact seems minimal.
>>
>> WDYT?
>
> Thanks, that helped form the basis for the attached patch. It moves the
> 'member' call into the lambda for the original 'mapcar', and then we can
> just 'delq' all the nil elements. Benchmarking against the Emacs repo,
> this still seems to have a minimal perf impact.
Ah, nice. delq is pretty handy to abbreviate the relinking.
We could combine both steps to eliminate an extra list altogether, but
that only yields a further 2-3% improvement. Maybe later.
> There might also be a benefit to using "git ls-files
> --recurse-submodules" when we can (i.e. when not using "-o"), but maybe
> we can leave that optimization for another day.
As long as it doesn't result in too much code duplication.
Anyway, I've pushed your latest patch. Thanks! And closing.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#66806
; Package
emacs
.
(Mon, 30 Oct 2023 03:55:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 66806 <at> debbugs.gnu.org (full text, mbox):
On 10/29/2023 7:00 PM, Dmitry Gutov wrote:
> On 30/10/2023 02:58, Jim Porter wrote:
>> (I looked into using the "--stage" argument for "git ls-files", which
>> gets most of the way to fixing this, but you could break that logic
>> with evil file names not in the tree...)
>
> Filtering out gitlinks based on the mode bits? Clever.
Well, it'd be even more clever if it weren't for the issue I mentioned. ;)
> Anyway, I've pushed your latest patch. Thanks! And closing.
Thanks for merging!
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 27 Nov 2023 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 38 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.