GNU bug report logs - #66806
30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program

Previous Next

Package: emacs;

Reported by: Jim Porter <jporterbugs <at> gmail.com>

Date: Sun, 29 Oct 2023 05:37:02 UTC

Severity: normal

Tags: patch

Found in version 30.0.50

Fixed in version 30.1

Done: Dmitry Gutov <dmitry <at> gutov.dev>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 66806 in the body.
You can then email your comments to 66806 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 05:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Porter <jporterbugs <at> gmail.com>:
New bug report received and forwarded. Copy sent to dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org. (Sun, 29 Oct 2023 05:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Porter <jporterbugs <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the
 search program
Date: Sat, 28 Oct 2023 22:36:07 -0700
[Message part 1 (text/plain, inline)]
X-Debbugs-Cc: dmitry <at> gutov.dev

To see this in action, you can do the following, starting from "emacs 
-Q" inside of a Git repo that contains submodules:

  M-x trace-function RET project-files RET
  C-x p g some-string RET

If you look at the trace, you'll see that the files in your submodules 
are returned from 'project-files', but so is the submodule directory. 
That's not really correct, since 'project-files' is supposed to return 
*files*, not directories. (There are already workarounds for this in 
'project-search' and 'project-query-replace-regexp'.)

By default, this is just a theoretical problem, but if you customize 
'xref-search-program-alist' and 'xref-search-program' to include some 
other program, this can cause real issues. For example, I tried to add 
"ag" to this list[1], and unfortunately, it just doesn't work in this 
case. The results for submodules get duplicated, and there's no way I 
can see with ag to search only the specified *files*, ignoring any 
directories. (Looking at the definition for ripgrep, I'm guessing the 
"-g '!*/'" is the trick for that program, but nothing similar works for ag.)

Attached are two possible patches for this: a minimal version that just 
fixes 'project-find-regexp', and a maximal version that fixes this in 
general, and should theoretically speed up 'project-search' and 
'project-query-replace-regexp' since they no longer need to call 
'file-regular-p' on every file.

Do either of these patches look ok?

[1] This is a long story, simplified for this message since the gory 
details aren't especially relevant.
[minimal_0001-Exclude-Git-submodules-when-getting-list-of-files-fo.patch (text/plain, attachment)]
[maximal_0001-Exclude-Git-submodules-from-project-files.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 06:08:02 GMT) Full text and rfc822 format available.

Message #8 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jim Porter <jporterbugs <at> gmail.com>
Cc: dmitry <at> gutov.dev, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50;
 [PATCH] 'project-find-regexp' passes Git submodules to the search
 program
Date: Sun, 29 Oct 2023 08:06:44 +0200
> Cc: dmitry <at> gutov.dev
> Date: Sat, 28 Oct 2023 22:36:07 -0700
> From: Jim Porter <jporterbugs <at> gmail.com>
> 
> --- a/lisp/progmodes/project.el
> +++ b/lisp/progmodes/project.el
> @@ -960,7 +960,8 @@ project-find-regexp
>           (default-directory (project-root pr))
>           (files
>            (if (not current-prefix-arg)
> -              (project-files pr)
> +              ;; XXX: See the comment in project-query-replace-regexp.
> +              (cl-delete-if-not #'file-regular-p (project-files pr))
                  ^^^^^^^^^^^^^^^^
I think we want to prefer using seq.el functions, since seq.el is
nowadays preloaded.  Is there a good reason to use cl-delete-if-not
here?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 17:56:02 GMT) Full text and rfc822 format available.

Message #11 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Jim Porter <jporterbugs <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Sun, 29 Oct 2023 10:54:28 -0700
[Message part 1 (text/plain, inline)]
On 10/28/2023 11:06 PM, Eli Zaretskii wrote:
>> Cc: dmitry <at> gutov.dev
>> Date: Sat, 28 Oct 2023 22:36:07 -0700
>> From: Jim Porter <jporterbugs <at> gmail.com>
>>
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -960,7 +960,8 @@ project-find-regexp
>>            (default-directory (project-root pr))
>>            (files
>>             (if (not current-prefix-arg)
>> -              (project-files pr)
>> +              ;; XXX: See the comment in project-query-replace-regexp.
>> +              (cl-delete-if-not #'file-regular-p (project-files pr))
>                    ^^^^^^^^^^^^^^^^
> I think we want to prefer using seq.el functions, since seq.el is
> nowadays preloaded.  Is there a good reason to use cl-delete-if-not
> here?

Well, that's just copy-pasted from some other functions in project.el. 
If we want to go the minimal route, I could update all those workarounds.

Or we could go the maximal route and fix it at its source. Here's an 
updated patch for the maximal route that uses 'seq-difference'.

Assuming we're ok with the performance characteristics of the maximal 
patch, I think the maximal route is best: it fixes the issue at its 
source. For performance, it should be faster by default, but a bit 
slower when 'project-vc-merge-submodules' is nil (since we need an extra 
call to "git" to get the list of submodules). However, that slowness is 
compensated for by eliminating the need to call 'file-regular-p' on all 
the results for some functions that really do need 'project-files' to 
return only files. (If we're really concerned about *exact* perf 
numbers, I can try to collect some. Just let me know.)
[maximal_0001-Exclude-Git-submodules-from-project-files.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 19:24:02 GMT) Full text and rfc822 format available.

Message #14 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jim Porter <jporterbugs <at> gmail.com>
Cc: dmitry <at> gutov.dev, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Sun, 29 Oct 2023 21:22:54 +0200
> Date: Sun, 29 Oct 2023 10:54:28 -0700
> Cc: dmitry <at> gutov.dev, 66806 <at> debbugs.gnu.org
> From: Jim Porter <jporterbugs <at> gmail.com>
> 
> >> -              (project-files pr)
> >> +              ;; XXX: See the comment in project-query-replace-regexp.
> >> +              (cl-delete-if-not #'file-regular-p (project-files pr))
> >                    ^^^^^^^^^^^^^^^^
> > I think we want to prefer using seq.el functions, since seq.el is
> > nowadays preloaded.  Is there a good reason to use cl-delete-if-not
> > here?
> 
> Well, that's just copy-pasted from some other functions in project.el. 
> If we want to go the minimal route, I could update all those workarounds.
> 
> Or we could go the maximal route and fix it at its source. Here's an 
> updated patch for the maximal route that uses 'seq-difference'.

That's up to Dmitry; from my POV it is enough that new code prefers
seq.el.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 21:43:02 GMT) Full text and rfc822 format available.

Message #17 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Jim Porter <jporterbugs <at> gmail.com>, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Sun, 29 Oct 2023 23:41:30 +0200
[Message part 1 (text/plain, inline)]
Hi Jim,

On 29/10/2023 07:36, Jim Porter wrote:
> By default, this is just a theoretical problem, but if you customize 
> 'xref-search-program-alist' and 'xref-search-program' to include some 
> other program, this can cause real issues. For example, I tried to add 
> "ag" to this list[1], and unfortunately, it just doesn't work in this 
> case. The results for submodules get duplicated, and there's no way I 
> can see with ag to search only the specified *files*, ignoring any 
> directories. (Looking at the definition for ripgrep, I'm guessing the 
> "-g '!*/'" is the trick for that program, but nothing similar works for 
> ag.)
> 
> Attached are two possible patches for this: a minimal version that just 
> fixes 'project-find-regexp', and a maximal version that fixes this in 
> general, and should theoretically speed up 'project-search' and 
> 'project-query-replace-regexp' since they no longer need to call 
> 'file-regular-p' on every file.

I kept this unfortunate situation around because every obvious fix 
brought non-negligible performance regressions: the version with 
file-regular-p slowed one of my examples (Mozilla's repo) by 370%. Your 
cl-set-difference version slowed it down by 10-14% -- better, but still 
something that seemed worse on balance when I tried this before, so I 
preferred to work around it in command implementations: the "-s" or 
"--no-messages" flags xref-search-program-alist.

And that's not to mention usage over Tramp (which would be affected by 
the +1 process call that you mentioned as well, but that seems unavoidable).

Anyway, after recent experience micro-optimizing list operations, I came 
up with this version where the impact seems minimal.

WDYT?
[project-no-submodules.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 22:04:02 GMT) Full text and rfc822 format available.

Message #20 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Jim Porter <jporterbugs <at> gmail.com>, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Mon, 30 Oct 2023 00:02:48 +0200
On 29/10/2023 07:36, Jim Porter wrote:
> (Looking at the definition for ripgrep, I'm guessing the "-g '!*/'" is 
> the trick for that program, but nothing similar works for ag.)

Actually, it's a failed solution which I copied from the rg's issue 
tracker but I didn't test it enough or it broke in some later versions, 
and then I put off dealing with it when finally noticed. Maybe it never 
worked and only could filter out directories when those are passed with 
a trailing slash.

What should work, is adding the argument "--max-depth 0" (stopping the 
traversal of passed directories). For ag, the corresponding fix seems to 
be "--depth 0", but I haven't tested that in practice.

Anyway, see the other email, which should make it all unnecessary 
(unless you have a 3rd party package which has to work with older 
project.el too).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Sun, 29 Oct 2023 22:16:02 GMT) Full text and rfc822 format available.

Message #23 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>, Jim Porter <jporterbugs <at> gmail.com>
Cc: 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Mon, 30 Oct 2023 00:14:48 +0200
On 29/10/2023 08:06, Eli Zaretskii wrote:
>> Cc:dmitry <at> gutov.dev
>> Date: Sat, 28 Oct 2023 22:36:07 -0700
>> From: Jim Porter<jporterbugs <at> gmail.com>
>>
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -960,7 +960,8 @@ project-find-regexp
>>            (default-directory (project-root pr))
>>            (files
>>             (if (not current-prefix-arg)
>> -              (project-files pr)
>> +              ;; XXX: See the comment in project-query-replace-regexp.
>> +              (cl-delete-if-not #'file-regular-p (project-files pr))
>                    ^^^^^^^^^^^^^^^^
> I think we want to prefer using seq.el functions, since seq.el is
> nowadays preloaded.  Is there a good reason to use cl-delete-if-not
> here?

I'm okay with using seq with other things equal, but in this case both 
cl- approaches are too slow, and seq-difference is no better because of 
consing and indirection overhead (about 10-20% slower than current). I'd 
say it's a shortcoming of seq.el: only having non-destructive versions.

I had some hope for cl-nset-difference, but looking at the 
implementation it just delegates to the non-destructive cousin (I guess 
providing the optimized implementation was left as TODO).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Mon, 30 Oct 2023 00:27:02 GMT) Full text and rfc822 format available.

Message #26 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Jim Porter <jporterbugs <at> gmail.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Sun, 29 Oct 2023 17:25:36 -0700
On 10/29/2023 3:02 PM, Dmitry Gutov wrote:
> What should work, is adding the argument "--max-depth 0" (stopping the 
> traversal of passed directories). For ag, the corresponding fix seems to 
> be "--depth 0", but I haven't tested that in practice.

I tried that, and sadly it didn't work. That could be a bug in ag though.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Mon, 30 Oct 2023 00:59:02 GMT) Full text and rfc822 format available.

Message #29 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Jim Porter <jporterbugs <at> gmail.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>, 66806 <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Sun, 29 Oct 2023 17:58:01 -0700
[Message part 1 (text/plain, inline)]
On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
> And that's not to mention usage over Tramp (which would be affected by 
> the +1 process call that you mentioned as well, but that seems 
> unavoidable).

Yeah, I don't see a way around that, unless we constructed a complex 
command to run via Tramp that does it all in one go.

(I looked into using the "--stage" argument for "git ls-files", which 
gets most of the way to fixing this, but you could break that logic with 
evil file names not in the tree...)

> Anyway, after recent experience micro-optimizing list operations, I came 
> up with this version where the impact seems minimal.
> 
> WDYT?

Thanks, that helped form the basis for the attached patch. It moves the 
'member' call into the lambda for the original 'mapcar', and then we can 
just 'delq' all the nil elements. Benchmarking against the Emacs repo, 
this still seems to have a minimal perf impact.

There might also be a benefit to using "git ls-files 
--recurse-submodules" when we can (i.e. when not using "-o"), but maybe 
we can leave that optimization for another day.
[0001-Exclude-Git-submodules-from-project-files.patch (text/plain, attachment)]

Reply sent to Dmitry Gutov <dmitry <at> gutov.dev>:
You have taken responsibility. (Mon, 30 Oct 2023 02:02:02 GMT) Full text and rfc822 format available.

Notification sent to Jim Porter <jporterbugs <at> gmail.com>:
bug acknowledged by developer. (Mon, 30 Oct 2023 02:02:02 GMT) Full text and rfc822 format available.

Message #34 received at 66806-done <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Jim Porter <jporterbugs <at> gmail.com>, 66806-done <at> debbugs.gnu.org
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Mon, 30 Oct 2023 04:00:51 +0200
Version: 30.1

On 30/10/2023 02:58, Jim Porter wrote:
> On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
>> And that's not to mention usage over Tramp (which would be affected by 
>> the +1 process call that you mentioned as well, but that seems 
>> unavoidable).
> 
> Yeah, I don't see a way around that, unless we constructed a complex 
> command to run via Tramp that does it all in one go.
> 
> (I looked into using the "--stage" argument for "git ls-files", which 
> gets most of the way to fixing this, but you could break that logic with 
> evil file names not in the tree...)

Filtering out gitlinks based on the mode bits? Clever.

Would probably slow down the parsing of the output, though. Not sure by 
how much.

>> Anyway, after recent experience micro-optimizing list operations, I 
>> came up with this version where the impact seems minimal.
>>
>> WDYT?
> 
> Thanks, that helped form the basis for the attached patch. It moves the 
> 'member' call into the lambda for the original 'mapcar', and then we can 
> just 'delq' all the nil elements. Benchmarking against the Emacs repo, 
> this still seems to have a minimal perf impact.

Ah, nice. delq is pretty handy to abbreviate the relinking.

We could combine both steps to eliminate an extra list altogether, but 
that only yields a further 2-3% improvement. Maybe later.

> There might also be a benefit to using "git ls-files 
> --recurse-submodules" when we can (i.e. when not using "-o"), but maybe 
> we can leave that optimization for another day.

As long as it doesn't result in too much code duplication.

Anyway, I've pushed your latest patch. Thanks! And closing.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66806; Package emacs. (Mon, 30 Oct 2023 03:55:02 GMT) Full text and rfc822 format available.

Message #37 received at 66806 <at> debbugs.gnu.org (full text, mbox):

From: Jim Porter <jporterbugs <at> gmail.com>
To: 66806 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git
 submodules to the search program
Date: Sun, 29 Oct 2023 20:53:23 -0700
On 10/29/2023 7:00 PM, Dmitry Gutov wrote:
> On 30/10/2023 02:58, Jim Porter wrote:
>> (I looked into using the "--stage" argument for "git ls-files", which 
>> gets most of the way to fixing this, but you could break that logic 
>> with evil file names not in the tree...)
> 
> Filtering out gitlinks based on the mode bits? Clever.

Well, it'd be even more clever if it weren't for the issue I mentioned. ;)

> Anyway, I've pushed your latest patch. Thanks! And closing.

Thanks for merging!




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 27 Nov 2023 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 3 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.