GNU bug report logs - #71094
[PATCH] Prefer to run find and grep in parallel in rgrep

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Tue, 21 May 2024 14:36:01 UTC

Severity: normal

Tags: patch

Done: Andrea Corallo <acorallo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 71094 in the body.
You can then email your comments to 71094 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Tue, 21 May 2024 14:36:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Spencer Baugh <sbaugh <at> janestreet.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 21 May 2024 14:36:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Cc: Glenn Morris <rgm <at> gnu.org>, dmitry <at> gutov.dev
Subject: [PATCH] Prefer to run find and grep in parallel in rgrep
Date: Tue, 21 May 2024 10:35:07 -0400
[Message part 1 (text/plain, inline)]
Tags: patch


grep.el prefers to run "find" and "xargs grep" in a pipeline,
which means that "find" can continue searching the filesystem
while "xargs grep" searches files.  If find and xargs don't
support the flags required for this behavior, grep.el will fall
back to using the -exec flags to "find", which meant "find" will
wait for each "grep" process to complete before continuing to
search the filesystem tree.  This behavior is controlled by
grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
the slower fallback.

In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
option was added for grep-find-use-xargs, which improves on
`exec' by running one "grep" process to search multiple files,
which `gnu' (by using xargs) already did.  However, the change
erroneously added the `exec-plus' case before the `gnu' case in
the autodetection code in grep-compute-defaults, so `exec-plus'
would be used even if `gnu' was supported.

This change just swaps the two cases, so the faster `gnu' option
is once again used in preference to `exec-plus'.  In my
benchmarking on a large repository, this provides a ~40%
speedup.


In GNU Emacs 29.2.50 (build 11, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2024-05-15 built on
 igm-qws-u22796a
Repository revision: 734740051bd377d24899d08d00ec8e1bb8e00e00
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.9 (Green Obsidian)

Configured using:
 'configure -C --with-x-toolkit=lucid --with-gif=ifavailable'

[0001-Prefer-to-run-find-and-grep-in-parallel-in-rgrep.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Tue, 21 May 2024 20:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Spencer Baugh <sbaugh <at> janestreet.com>, 71094 <at> debbugs.gnu.org
Cc: Glenn Morris <rgm <at> gnu.org>
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Tue, 21 May 2024 23:00:06 +0300
Hi Spencer,

On 21/05/2024 17:35, Spencer Baugh wrote:
> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
> option was added for grep-find-use-xargs, which improves on
> `exec' by running one "grep" process to search multiple files,
> which `gnu' (by using xargs) already did.  However, the change
> erroneously added the `exec-plus' case before the `gnu' case in
> the autodetection code in grep-compute-defaults, so `exec-plus'
> would be used even if `gnu' was supported.

Perhaps the thinking was that piping data through a +1 program, with 
associated copying, should be more expensive than delegating that to 'find'.

> This change just swaps the two cases, so the faster `gnu' option
> is once again used in preference to `exec-plus'.  In my
> benchmarking on a large repository, this provides a ~40%
> speedup.

I can confirm, an improvement of ~30% here. Specifically in the "many 
files, few matches" scenario. Nice find.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 12:01:02 GMT) Full text and rfc822 format available.

Message #11 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 14:59:39 +0300
> Cc: Glenn Morris <rgm <at> gnu.org>, dmitry <at> gutov.dev
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Date: Tue, 21 May 2024 10:35:07 -0400
> 
> grep.el prefers to run "find" and "xargs grep" in a pipeline,
> which means that "find" can continue searching the filesystem
> while "xargs grep" searches files.  If find and xargs don't
> support the flags required for this behavior, grep.el will fall
> back to using the -exec flags to "find", which meant "find" will
> wait for each "grep" process to complete before continuing to
> search the filesystem tree.  This behavior is controlled by
> grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
> the slower fallback.
> 
> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
> option was added for grep-find-use-xargs, which improves on
> `exec' by running one "grep" process to search multiple files,
> which `gnu' (by using xargs) already did.  However, the change
> erroneously added the `exec-plus' case before the `gnu' case in
> the autodetection code in grep-compute-defaults, so `exec-plus'
> would be used even if `gnu' was supported.
> 
> This change just swaps the two cases, so the faster `gnu' option
> is once again used in preference to `exec-plus'.  In my
> benchmarking on a large repository, this provides a ~40%
> speedup.

With how many files did you measure the 40% speedup?  Can you show the
performance with much fewer and much more files than what you used?  I
suspect that the effect depends on that.  (It also depends on the
system limit on the number of files and the length of the command line
that xargs can use.)  The argument about 'find' waiting is no longer
relevant with 'exec-plus', since in most cases there will be just one
invocation of 'grep'.

In any case, please modify the patch so that 'exec-plus' is still
preferred on MS-Windows (because most Windows ports of xargs are IME
abysmally buggy, so better avoided as much as possible).

A comment there with the justification of the order will also be
appreciated.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 12:35:01 GMT) Full text and rfc822 format available.

Message #14 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>, Spencer Baugh <sbaugh <at> janestreet.com>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 15:34:06 +0300
On 22/05/2024 14:59, Eli Zaretskii wrote:

> With how many files did you measure the 40% speedup?  Can you show the
> performance with much fewer and much more files than what you used?

FWIW my test indicated that for a smaller project (such as Emacs) the 
difference is fairly small - the new code is slightly better or the same.

The directory where I saw significant improvement has 300K files.

> I
> suspect that the effect depends on that.  (It also depends on the
> system limit on the number of files and the length of the command line
> that xargs can use.)  The argument about 'find' waiting is no longer
> relevant with 'exec-plus', since in most cases there will be just one
> invocation of 'grep'.

If there's just one invocation, wouldn't that mean that it will happen 
at the end of the full directory scan? Rather than in parallel.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 12:55:02 GMT) Full text and rfc822 format available.

Message #17 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 08:54:25 -0400
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Cc: Glenn Morris <rgm <at> gnu.org>, dmitry <at> gutov.dev
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Date: Tue, 21 May 2024 10:35:07 -0400
>> 
>> grep.el prefers to run "find" and "xargs grep" in a pipeline,
>> which means that "find" can continue searching the filesystem
>> while "xargs grep" searches files.  If find and xargs don't
>> support the flags required for this behavior, grep.el will fall
>> back to using the -exec flags to "find", which meant "find" will
>> wait for each "grep" process to complete before continuing to
>> search the filesystem tree.  This behavior is controlled by
>> grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
>> the slower fallback.
>> 
>> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
>> option was added for grep-find-use-xargs, which improves on
>> `exec' by running one "grep" process to search multiple files,
>> which `gnu' (by using xargs) already did.  However, the change
>> erroneously added the `exec-plus' case before the `gnu' case in
>> the autodetection code in grep-compute-defaults, so `exec-plus'
>> would be used even if `gnu' was supported.
>> 
>> This change just swaps the two cases, so the faster `gnu' option
>> is once again used in preference to `exec-plus'.  In my
>> benchmarking on a large repository, this provides a ~40%
>> speedup.
>
> With how many files did you measure the 40% speedup?

700k

> Can you show the performance with much fewer and much more files than
> what you used?

Much more is maybe hard, but much fewer is easy: with 212 files (a
subset of the original directory I searched), there's no performance
change.

> I suspect that the effect depends on that.  (It also depends on the
>system limit on the number of files and the length of the command line
>that xargs can use.)  The argument about 'find' waiting is no longer
>relevant with 'exec-plus', since in most cases there will be just one
>invocation of 'grep'.

True, it only matters when the directory tree contains more files than
can be passed to a single invocation of grep.

> In any case, please modify the patch so that 'exec-plus' is still
> preferred on MS-Windows (because most Windows ports of xargs are IME
> abysmally buggy, so better avoided as much as possible).
>
> A comment there with the justification of the order will also be
> appreciated.

Done, attached.

[0001-Prefer-to-run-find-and-grep-in-parallel-in-rgrep.patch (text/x-patch, inline)]
From e7fbfe431ae1f4f004f1d92db2f3b011b30ff682 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Tue, 21 May 2024 10:32:45 -0400
Subject: [PATCH] Prefer to run find and grep in parallel in rgrep

grep.el prefers to run "find" and "xargs grep" in a pipeline,
which means that "find" can continue searching the filesystem
while "xargs grep" searches files.  If find and xargs don't
support the flags required for this behavior, grep.el will fall
back to using the -exec flags to "find", which meant "find" will
wait for each "grep" process to complete before continuing to
search the filesystem tree.  This behavior is controlled by
grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
the slower fallback.

In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
option was added for grep-find-use-xargs, which improves on
`exec' by running one "grep" process to search multiple files,
which `gnu' (by using xargs) already did.  However, the change
erroneously added the `exec-plus' case before the `gnu' case in
the autodetection code in grep-compute-defaults, so `exec-plus'
would be used even if `gnu' was supported.

This change just swaps the two cases, so the faster `gnu' option
is once again used in preference to `exec-plus'.  In my
benchmarking on a large repository, this provides a ~40%
speedup.

Also, we completely avoid running xargs on MS-Windows, because Eli
Zaretskii <eliz <at> gnu.org> writes:

> most Windows ports of xargs are IME abysmally buggy, so better avoided
> as much as possible

* lisp/progmodes/grep.el (grep-compute-defaults): Prefer `gnu' for
grep-find-use-xargs over `exec-plus', but not on Windows.  (bug#71094)
---
 lisp/progmodes/grep.el | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index 0a9de04fce1..ce54c57aabc 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -812,15 +812,23 @@ grep-compute-defaults
 	(unless grep-find-use-xargs
 	  (setq grep-find-use-xargs
 		(cond
-		 ((grep-probe find-program
-			      `(nil nil nil ,(null-device) "-exec" "echo"
-				    "{}" "+"))
-		  'exec-plus)
+                 ;; For performance, we want:
+                 ;; A. Run grep on batches of files (instead of one grep per file)
+                 ;; B. If the directory is large and we need multiple batches,
+                 ;;    run find in parallel with a running grep.
+                 ;; "find | xargs grep" gives both A and B
 		 ((and
+                   (not (eq system-type 'windows-nt))
 		   (grep-probe
                     find-program `(nil nil nil ,(null-device) "-print0"))
 		   (grep-probe xargs-program '(nil nil nil "-0" "echo")))
 		  'gnu)
+                 ;; "find -exec {} +" gives A but not B
+		 ((grep-probe find-program
+			      `(nil nil nil ,(null-device) "-exec" "echo"
+				    "{}" "+"))
+		  'exec-plus)
+                 ;; "find -exec {} ;" gives neither A nor B.
 		 (t
 		  'exec))))
 	(unless grep-find-command
-- 
2.39.3


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 13:52:02 GMT) Full text and rfc822 format available.

Message #20 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 16:50:57 +0300
> Date: Wed, 22 May 2024 15:34:06 +0300
> Cc: 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> On 22/05/2024 14:59, Eli Zaretskii wrote:
> 
> > With how many files did you measure the 40% speedup?  Can you show the
> > performance with much fewer and much more files than what you used?
> 
> FWIW my test indicated that for a smaller project (such as Emacs) the 
> difference is fairly small - the new code is slightly better or the same.
> 
> The directory where I saw significant improvement has 300K files.

That's what I thought.  So we are changing the decade-old defaults to
favor huge directories, which is not necessarily the wisest thing to
do.

> > I
> > suspect that the effect depends on that.  (It also depends on the
> > system limit on the number of files and the length of the command line
> > that xargs can use.)  The argument about 'find' waiting is no longer
> > relevant with 'exec-plus', since in most cases there will be just one
> > invocation of 'grep'.
> 
> If there's just one invocation, wouldn't that mean that it will happen 
> at the end of the full directory scan? Rather than in parallel.

That's true, but what is your mental model of how the pipe with xargs
works in practice?  How many invocations of grep will xargs do, and
when will the first invocation happen?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 14:24:02 GMT) Full text and rfc822 format available.

Message #23 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 17:22:56 +0300
On 22/05/2024 16:50, Eli Zaretskii wrote:
>> Date: Wed, 22 May 2024 15:34:06 +0300
>> Cc: 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
>> From: Dmitry Gutov <dmitry <at> gutov.dev>
>>
>> On 22/05/2024 14:59, Eli Zaretskii wrote:
>>
>>> With how many files did you measure the 40% speedup?  Can you show the
>>> performance with much fewer and much more files than what you used?
>>
>> FWIW my test indicated that for a smaller project (such as Emacs) the
>> difference is fairly small - the new code is slightly better or the same.
>>
>> The directory where I saw significant improvement has 300K files.
> 
> That's what I thought.  So we are changing the decade-old defaults to
> favor huge directories, which is not necessarily the wisest thing to
> do.

I don't see any regression on small directories, though. And an 
improvement on big ones.

So the way I see it, we're expanding Emacs's applicability to wider 
audience without any apparent drawbacks.

It might actually give us an improvement in smaller projects as well, if 
we decrease xargs's batch size (with -s or -n). But those are fairly 
fast already, so it's not critical.

>>> I
>>> suspect that the effect depends on that.  (It also depends on the
>>> system limit on the number of files and the length of the command line
>>> that xargs can use.)  The argument about 'find' waiting is no longer
>>> relevant with 'exec-plus', since in most cases there will be just one
>>> invocation of 'grep'.
>>
>> If there's just one invocation, wouldn't that mean that it will happen
>> at the end of the full directory scan? Rather than in parallel.
> 
> That's true, but what is your mental model of how the pipe with xargs
> works in practice?  How many invocations of grep will xargs do, and
> when will the first invocation happen?

In my mental model xargs acts like an asynchronous queue with batch 
processing. The first invocation will happen after the output reaches 
the maximum line number of maximum number of arguments configured. They 
are system-dependent by default.

For example, on my system 'xargs --show-limits' says

  Size of command buffer we are actually using: 131072

Whereas in the Emacs repository "find ... -print0 | wc" reports 202928 
characters. Meaning, it uses just 1.5 'grep' invocations. To see better 
parallelism there we'll need to either lower the limit or test it in a 
project at least twice as big.

So here is another example: a Linux kernel checkout (76K files). Also 
about 30% improvement: 1.40s vs 2.00s.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 14:44:02 GMT) Full text and rfc822 format available.

Message #26 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 17:42:50 +0300
> Date: Wed, 22 May 2024 17:22:56 +0300
> Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> >> The directory where I saw significant improvement has 300K files.
> > 
> > That's what I thought.  So we are changing the decade-old defaults to
> > favor huge directories, which is not necessarily the wisest thing to
> > do.
> 
> I don't see any regression on small directories, though. And an 
> improvement on big ones.

On your system.

> > That's true, but what is your mental model of how the pipe with xargs
> > works in practice?  How many invocations of grep will xargs do, and
> > when will the first invocation happen?
> 
> In my mental model xargs acts like an asynchronous queue with batch 
> processing. The first invocation will happen after the output reaches 
> the maximum line number of maximum number of arguments configured. They 
> are system-dependent by default.

And can be rather small.  But if it is large, then...

> For example, on my system 'xargs --show-limits' says
> 
>    Size of command buffer we are actually using: 131072
> 
> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928 
> characters. Meaning, it uses just 1.5 'grep' invocations. To see better 
> parallelism there we'll need to either lower the limit or test it in a 
> project at least twice as big.

...until xargs collects all those characters, it will not invoke grep,
right?  So, for directories whose file names total less than those
200K, xargs will still wait until find ends its job, right?

> So here is another example: a Linux kernel checkout (76K files). Also 
> about 30% improvement: 1.40s vs 2.00s.

This is all highly system-dependent.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 14:51:01 GMT) Full text and rfc822 format available.

Message #29 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 17:50:42 +0300
On 22/05/2024 17:42, Eli Zaretskii wrote:
>>> That's true, but what is your mental model of how the pipe with xargs
>>> works in practice?  How many invocations of grep will xargs do, and
>>> when will the first invocation happen?
>>
>> In my mental model xargs acts like an asynchronous queue with batch
>> processing. The first invocation will happen after the output reaches
>> the maximum line number of maximum number of arguments configured. They
>> are system-dependent by default.
> 
> And can be rather small.  But if it is large, then...
> 
>> For example, on my system 'xargs --show-limits' says
>>
>>     Size of command buffer we are actually using: 131072
>>
>> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
>> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
>> parallelism there we'll need to either lower the limit or test it in a
>> project at least twice as big.
> 
> ...until xargs collects all those characters, it will not invoke grep,
> right?  So, for directories whose file names total less than those
> 200K, xargs will still wait until find ends its job, right?

That's right. And it's why we're not seeing much of a difference in 
projects of Emacs's size or smaller. No apparent regression either, though.

>> So here is another example: a Linux kernel checkout (76K files). Also
>> about 30% improvement: 1.40s vs 2.00s.
> 
> This is all highly system-dependent.

Naturally. So it'd be great to see some additional data points from 
users on other systems.

Especially those where the default limit is lower than it is on mine.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 15:28:02 GMT) Full text and rfc822 format available.

Message #32 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 18:26:45 +0300
> Date: Wed, 22 May 2024 17:50:42 +0300
> Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> >> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
> >> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
> >> parallelism there we'll need to either lower the limit or test it in a
> >> project at least twice as big.
> > 
> > ...until xargs collects all those characters, it will not invoke grep,
> > right?  So, for directories whose file names total less than those
> > 200K, xargs will still wait until find ends its job, right?
> 
> That's right. And it's why we're not seeing much of a difference in 
> projects of Emacs's size or smaller. No apparent regression either, though.

But we added xargs to the soup.  On GNU/Linux, where GNU Findutils are
developed, it probably isn't a problem.  On other systems, not
necessarily...

> >> So here is another example: a Linux kernel checkout (76K files). Also
> >> about 30% improvement: 1.40s vs 2.00s.
> > 
> > This is all highly system-dependent.
> 
> Naturally. So it'd be great to see some additional data points from 
> users on other systems.
> 
> Especially those where the default limit is lower than it is on mine.

I'd be happy if someone could time these methods on MS-Windows and on
some *BSD system, at least.  Bonus points for macOS.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 17:49:01 GMT) Full text and rfc822 format available.

Message #35 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 20:47:57 +0300
On 22/05/2024 18:26, Eli Zaretskii wrote:

> I'd be happy if someone could time these methods on MS-Windows and on
> some *BSD system, at least.  Bonus points for macOS.

As luck would have it, I have an M3 Pro macOS laptop around.

The situation with it is odd, as usual. First of all, the default 
find/xargs/grep installed are some very slow versions from Apple.

The patch doesn't seem to change the performance of the search using 
them, it's just slow either way.

Things get better if I install the GNU versions from Homebrew and

   (setq grep-program "ggrep")

at startup. Performance gets better by 4x or so just from that, but 
still not to the level of my 5-year-old GNU/Linux laptop. The patch 
doesn't seem to have a make a difference still. If I also set

   (setq xargs-program "gxargs")

then the patch starts improving performance in a large directory (again: 
Linux kernel), by around 10%. Still more than 3x slower than on my older 
laptop with Linux. No idea why - the ggrep, gxargs and gfind executables 
are all reported to be arm64, so I can't blame the x64->arm64 
translation layer.

To sum up though, the patch under discussion doesn't make things worse 
on the macOS laptop I tested.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 18:07:02 GMT) Full text and rfc822 format available.

Message #38 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Manuel Giraud <manuel <at> ledu-giraud.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org,
 sbaugh <at> janestreet.com
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 20:06:44 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

[...]

>> >> So here is another example: a Linux kernel checkout (76K files). Also
>> >> about 30% improvement: 1.40s vs 2.00s.
>> > 
>> > This is all highly system-dependent.
>> 
>> Naturally. So it'd be great to see some additional data points from 
>> users on other systems.
>> 
>> Especially those where the default limit is lower than it is on mine.
>
> I'd be happy if someone could time these methods on MS-Windows and on
> some *BSD system, at least.  Bonus points for macOS.

I'm not sure it is what you asked for but here is some numbers on
OpenBSD (native 'find' and 'xargs'):

$ time find ~/emacs-repo -type f -exec grep foo {} + > /dev/null
    0m04.09s real     0m03.29s user     0m00.74s system
$ time find ~/emacs-repo -type f -print0 | xargs -0 grep foo > /dev/null
    0m04.10s real     0m03.45s user     0m00.66s system    

$ find /usr/src -type f | wc -l
  114315
$ time find /usr/src -type f -exec grep foo {} + > /dev/null
    0m14.07s real     0m07.68s user     0m06.29s system
$ time find /usr/src -type f -print0 | xargs -0 grep foo > /dev/null
    0m13.83s real     0m07.94s user     0m06.25s system
-- 
Manuel Giraud




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 18:24:02 GMT) Full text and rfc822 format available.

Message #41 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 21:21:14 +0300
> Date: Wed, 22 May 2024 20:47:57 +0300
> Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> To sum up though, the patch under discussion doesn't make things worse 
> on the macOS laptop I tested.

Thanks, it's good to know.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 18:31:01 GMT) Full text and rfc822 format available.

Message #44 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Manuel Giraud <manuel <at> ledu-giraud.fr>
Cc: dmitry <at> gutov.dev, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org, sbaugh <at> janestreet.com
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 21:30:27 +0300
> From: Manuel Giraud <manuel <at> ledu-giraud.fr>
> Cc: Dmitry Gutov <dmitry <at> gutov.dev>,  sbaugh <at> janestreet.com,
>   71094 <at> debbugs.gnu.org,  rgm <at> gnu.org
> Date: Wed, 22 May 2024 20:06:44 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> [...]
> 
> >> >> So here is another example: a Linux kernel checkout (76K files). Also
> >> >> about 30% improvement: 1.40s vs 2.00s.
> >> > 
> >> > This is all highly system-dependent.
> >> 
> >> Naturally. So it'd be great to see some additional data points from 
> >> users on other systems.
> >> 
> >> Especially those where the default limit is lower than it is on mine.
> >
> > I'd be happy if someone could time these methods on MS-Windows and on
> > some *BSD system, at least.  Bonus points for macOS.
> 
> I'm not sure it is what you asked for but here is some numbers on
> OpenBSD (native 'find' and 'xargs'):
> 
> $ time find ~/emacs-repo -type f -exec grep foo {} + > /dev/null
>     0m04.09s real     0m03.29s user     0m00.74s system
> $ time find ~/emacs-repo -type f -print0 | xargs -0 grep foo > /dev/null
>     0m04.10s real     0m03.45s user     0m00.66s system    
> 
> $ find /usr/src -type f | wc -l
>   114315
> $ time find /usr/src -type f -exec grep foo {} + > /dev/null
>     0m14.07s real     0m07.68s user     0m06.29s system
> $ time find /usr/src -type f -print0 | xargs -0 grep foo > /dev/null
>     0m13.83s real     0m07.94s user     0m06.25s system

Thanks, but we need the timings of the corresponding Emacs commands,
not the commands run from the shell prompt.

Btw, are you sure that xargs or grep don't pay attention to the fact
that their output is redirected to the null device, and do nothing?
Some variants of these commands are known to use such a trick, AFAIR.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 18:52:02 GMT) Full text and rfc822 format available.

Message #47 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Manuel Giraud <manuel <at> ledu-giraud.fr>, Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 21:51:20 +0300
On 22/05/2024 21:06, Manuel Giraud wrote:
>> I'd be happy if someone could time these methods on MS-Windows and on
>> some *BSD system, at least.  Bonus points for macOS.
> I'm not sure it is what you asked for but here is some numbers on
> OpenBSD (native 'find' and 'xargs'):
> 
> $ time find ~/emacs-repo -type f -exec grep foo {} + > /dev/null
>      0m04.09s real     0m03.29s user     0m00.74s system
> $ time find ~/emacs-repo -type f -print0 | xargs -0 grep foo > /dev/null
>      0m04.10s real     0m03.45s user     0m00.66s system
> 
> $ find /usr/src -type f | wc -l
>    114315
> $ time find /usr/src -type f -exec grep foo {} + > /dev/null
>      0m14.07s real     0m07.68s user     0m06.29s system
> $ time find /usr/src -type f -print0 | xargs -0 grep foo > /dev/null
>      0m13.83s real     0m07.94s user     0m06.25s system

I'm not sure how many matches for 'foo' are there inside your /usr/src, 
but if there are a lot, it slows down the last phase (grep output), 
making the performance gains a wash.

For this particular scenario, it's better to search for a string with no 
matches. Then you won't need to redirect to /dev/null too.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 19:16:01 GMT) Full text and rfc822 format available.

Message #50 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Manuel Giraud <manuel <at> ledu-giraud.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org, sbaugh <at> janestreet.com
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 21:15:02 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

[...]

> Thanks, but we need the timings of the corresponding Emacs commands,
> not the commands run from the shell prompt.

Ok.  What are those commands and how to timed them?

> Btw, are you sure that xargs or grep don't pay attention to the fact
> that their output is redirected to the null device, and do nothing?
> Some variants of these commands are known to use such a trick, AFAIR.

I don't know.
-- 
Manuel Giraud




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 19:37:02 GMT) Full text and rfc822 format available.

Message #53 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Manuel Giraud <manuel <at> ledu-giraud.fr>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, Eli Zaretskii <eliz <at> gnu.org>, 71094 <at> debbugs.gnu.org,
 rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 21:36:20 +0200
Dmitry Gutov <dmitry <at> gutov.dev> writes:

> On 22/05/2024 21:06, Manuel Giraud wrote:
>>> I'd be happy if someone could time these methods on MS-Windows and on
>>> some *BSD system, at least.  Bonus points for macOS.
>> I'm not sure it is what you asked for but here is some numbers on
>> OpenBSD (native 'find' and 'xargs'):
>> $ time find ~/emacs-repo -type f -exec grep foo {} + > /dev/null
>>      0m04.09s real     0m03.29s user     0m00.74s system
>> $ time find ~/emacs-repo -type f -print0 | xargs -0 grep foo > /dev/null
>>      0m04.10s real     0m03.45s user     0m00.66s system
>> $ find /usr/src -type f | wc -l
>>    114315
>> $ time find /usr/src -type f -exec grep foo {} + > /dev/null
>>      0m14.07s real     0m07.68s user     0m06.29s system
>> $ time find /usr/src -type f -print0 | xargs -0 grep foo > /dev/null
>>      0m13.83s real     0m07.94s user     0m06.25s system
>
> I'm not sure how many matches for 'foo' are there inside your
> /usr/src, but if there are a lot, it slows down the last phase (grep
> output), making the performance gains a wash.
>
> For this particular scenario, it's better to search for a string with
> no matches. Then you won't need to redirect to /dev/null too.

Ok, good to know.  Here is some new numbers:

$ time find /usr/src -type f -exec grep "DIZ_-{)9064gd" {} +
    0m11.74s real     0m05.54s user     0m06.14s system
$ time find /usr/src -type f -print0 | xargs -0 grep "DIZ_-{)9064gd"
    0m11.70s real     0m05.59s user     0m06.50s system
-- 
Manuel Giraud




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 22 May 2024 20:00:02 GMT) Full text and rfc822 format available.

Message #56 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Manuel Giraud <manuel <at> ledu-giraud.fr>
Cc: sbaugh <at> janestreet.com, Eli Zaretskii <eliz <at> gnu.org>, 71094 <at> debbugs.gnu.org,
 rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 22 May 2024 22:59:28 +0300
On 22/05/2024 22:36, Manuel Giraud wrote:
> Ok, good to know.  Here is some new numbers:
> 
> $ time find /usr/src -type f -exec grep "DIZ_-{)9064gd" {} +
>      0m11.74s real     0m05.54s user     0m06.14s system
> $ time find /usr/src -type f -print0 | xargs -0 grep "DIZ_-{)9064gd"
>      0m11.70s real     0m05.59s user     0m06.50s system

Looks about the same. I think that helps, thank you.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Thu, 23 May 2024 04:49:02 GMT) Full text and rfc822 format available.

Message #59 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Manuel Giraud <manuel <at> ledu-giraud.fr>
Cc: dmitry <at> gutov.dev, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org, sbaugh <at> janestreet.com
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Thu, 23 May 2024 07:46:12 +0300
> From: Manuel Giraud <manuel <at> ledu-giraud.fr>
> Cc: dmitry <at> gutov.dev,  sbaugh <at> janestreet.com,  71094 <at> debbugs.gnu.org,
>   rgm <at> gnu.org
> Date: Wed, 22 May 2024 21:15:02 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> [...]
> 
> > Thanks, but we need the timings of the corresponding Emacs commands,
> > not the commands run from the shell prompt.
> 
> Ok.  What are those commands and how to timed them?

It's rgrep, AFAIU, according to the original report in this bug's
discussion.  Dmitry, would you please show Manual the commands you
were running in your benchmarks?

> > Btw, are you sure that xargs or grep don't pay attention to the fact
> > that their output is redirected to the null device, and do nothing?
> > Some variants of these commands are known to use such a trick, AFAIR.
> 
> I don't know.

OK, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Thu, 23 May 2024 13:25:02 GMT) Full text and rfc822 format available.

Message #62 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>, Manuel Giraud <manuel <at> ledu-giraud.fr>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Thu, 23 May 2024 16:24:11 +0300
On 23/05/2024 07:46, Eli Zaretskii wrote:
>> From: Manuel Giraud<manuel <at> ledu-giraud.fr>
>> Cc:dmitry <at> gutov.dev,sbaugh <at> janestreet.com,71094 <at> debbugs.gnu.org,
>>    rgm <at> gnu.org
>> Date: Wed, 22 May 2024 21:15:02 +0200
>>
>> Eli Zaretskii<eliz <at> gnu.org>  writes:
>>
>> [...]
>>
>>> Thanks, but we need the timings of the corresponding Emacs commands,
>>> not the commands run from the shell prompt.
>> Ok.  What are those commands and how to timed them?
> It's rgrep, AFAIU, according to the original report in this bug's
> discussion.  Dmitry, would you please show Manual the commands you
> were running in your benchmarks?

1. Visit a directory with a fair number of files.
2. M-x rgrep, enter some odd regexp like "asdfasf@!#!" and "*" for the 
files wildcard.
3. Perform the search. Look at the end of the *grep* output, it will say 
something like "duration: 4.52 s". Note the number.

And you could repeat the same after applying the patch, recompiling 
Emacs (or at least grep.el) and restarting.

Preferably do the scenario 2-3 times to ensure that the filesystem cache 
is warm, and cold file access speed doesn't skew the numbers.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Fri, 24 May 2024 17:46:02 GMT) Full text and rfc822 format available.

Message #65 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Manuel Giraud <manuel <at> ledu-giraud.fr>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, Eli Zaretskii <eliz <at> gnu.org>, 71094 <at> debbugs.gnu.org,
 rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Fri, 24 May 2024 19:44:41 +0200
Dmitry Gutov <dmitry <at> gutov.dev> writes:

> On 23/05/2024 07:46, Eli Zaretskii wrote:
>>> From: Manuel Giraud<manuel <at> ledu-giraud.fr>
>>> Cc:dmitry <at> gutov.dev,sbaugh <at> janestreet.com,71094 <at> debbugs.gnu.org,
>>>    rgm <at> gnu.org
>>> Date: Wed, 22 May 2024 21:15:02 +0200
>>>
>>> Eli Zaretskii<eliz <at> gnu.org>  writes:
>>>
>>> [...]
>>>
>>>> Thanks, but we need the timings of the corresponding Emacs commands,
>>>> not the commands run from the shell prompt.
>>> Ok.  What are those commands and how to timed them?
>> It's rgrep, AFAIU, according to the original report in this bug's
>> discussion.  Dmitry, would you please show Manual the commands you
>> were running in your benchmarks?
>
> 1. Visit a directory with a fair number of files.
> 2. M-x rgrep, enter some odd regexp like "asdfasf@!#!" and "*" for the
> files wildcard.
> 3. Perform the search. Look at the end of the *grep* output, it will
> say something like "duration: 4.52 s". Note the number.

Thanks.  Here is what I get first without and then with the patch after
cache warming on both runs:

--8<---------------cut here---------------start------------->8---
-*- mode: grep; default-directory: "/usr/src/" -*-
Grep started at Fri May 24 19:04:44

find -H . -type d \( -path \*/SCCS -o -path \*/RCS -o -path \*/CVS -o -path \*/MCVS -o -path \*/.src -o -path \*/.svn -o -path \*/.git -o -path \*/.hg -o -path \*/.bzr -o -path \*/_MTN -o -path \*/_darcs -o -path \*/\{arch\} \) -prune -o \! -type d \( -name .\#\* -o -name \*.o -o -name \*\~ -o -name \*.bin -o -name \*.lbin -o -name \*.so -o -name \*.a -o -name \*.ln -o -name \*.blg -o -name \*.bbl -o -name \*.elc -o -name \*.lof -o -name \*.glo -o -name \*.idx -o -name \*.lot -o -name \*.fmt -o -name \*.tfm -o -name \*.class -o -name \*.fas -o -name \*.lib -o -name \*.mem -o -name \*.x86f -o -name \*.sparcf -o -name \*.dfsl -o -name \*.pfsl -o -name \*.d64fsl -o -name \*.p64fsl -o -name \*.lx64fsl -o -name \*.lx32fsl -o -name \*.dx64fsl -o -name \*.dx32fsl -o -name \*.fx64fsl -o -name \*.fx32fsl -o -name \*.sx64fsl -o -name \*.sx32fsl -o -name \*.wx64fsl -o -name \*.wx32fsl -o -name \*.fasl -o -name \*.ufsl -o -name \*.fsl -o -name \*.dxl -o -name \*.lo -o -name \*.la -o -name \*.gmo -o -name \*.mo -o -name \*.toc -o -name \*.aux -o -name \*.cp -o -name \*.fn -o -name \*.ky -o -name \*.pg -o -name \*.tp -o -name \*.vr -o -name \*.cps -o -name \*.fns -o -name \*.kys -o -name \*.pgs -o -name \*.tps -o -name \*.vrs -o -name \*.pyc -o -name \*.pyo \) -prune -o  -type f \( -name \* -o -name .\* \) -exec grep -i -nH --null -e asdfasf\@\!\#\! \{\} +

Grep finished with no matches found at Fri May 24 19:04:54, duration 10.2 s
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
-*- mode: grep; default-directory: "/usr/src/" -*-
Grep started at Fri May 24 19:37:28

find -H . -type d \( -path \*/SCCS -o -path \*/RCS -o -path \*/CVS -o -path \*/MCVS -o -path \*/.src -o -path \*/.svn -o -path \*/.git -o -path \*/.hg -o -path \*/.bzr -o -path \*/_MTN -o -path \*/_darcs -o -path \*/\{arch\} \) -prune -o \! -type d \( -name .\#\* -o -name \*.o -o -name \*\~ -o -name \*.bin -o -name \*.lbin -o -name \*.so -o -name \*.a -o -name \*.ln -o -name \*.blg -o -name \*.bbl -o -name \*.elc -o -name \*.lof -o -name \*.glo -o -name \*.idx -o -name \*.lot -o -name \*.fmt -o -name \*.tfm -o -name \*.class -o -name \*.fas -o -name \*.lib -o -name \*.mem -o -name \*.x86f -o -name \*.sparcf -o -name \*.dfsl -o -name \*.pfsl -o -name \*.d64fsl -o -name \*.p64fsl -o -name \*.lx64fsl -o -name \*.lx32fsl -o -name \*.dx64fsl -o -name \*.dx32fsl -o -name \*.fx64fsl -o -name \*.fx32fsl -o -name \*.sx64fsl -o -name \*.sx32fsl -o -name \*.wx64fsl -o -name \*.wx32fsl -o -name \*.fasl -o -name \*.ufsl -o -name \*.fsl -o -name \*.dxl -o -name \*.lo -o -name \*.la -o -name \*.gmo -o -name \*.mo -o -name \*.toc -o -name \*.aux -o -name \*.cp -o -name \*.fn -o -name \*.ky -o -name \*.pg -o -name \*.tp -o -name \*.vr -o -name \*.cps -o -name \*.fns -o -name \*.kys -o -name \*.pgs -o -name \*.tps -o -name \*.vrs -o -name \*.pyc -o -name \*.pyo \) -prune -o  -type f \( -name \* -o -name .\* \) -print0 | "xargs" -0 grep -i -nH --null -e asdfasf\@\!\#\!

Grep finished with no matches found at Fri May 24 19:37:37, duration 9.01 s
--8<---------------cut here---------------end--------------->8---
-- 
Manuel Giraud




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Sun, 26 May 2024 09:48:01 GMT) Full text and rfc822 format available.

Message #68 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Sun, 26 May 2024 12:47:32 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
> Date: Wed, 22 May 2024 08:54:25 -0400
> 
> > In any case, please modify the patch so that 'exec-plus' is still
> > preferred on MS-Windows (because most Windows ports of xargs are IME
> > abysmally buggy, so better avoided as much as possible).
> >
> > A comment there with the justification of the order will also be
> > appreciated.
> 
> Done, attached.

Thanks, LGTM.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Sun, 26 May 2024 15:59:01 GMT) Full text and rfc822 format available.

Message #71 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Manuel Giraud <manuel <at> ledu-giraud.fr>
Cc: sbaugh <at> janestreet.com, Eli Zaretskii <eliz <at> gnu.org>, 71094 <at> debbugs.gnu.org,
 rgm <at> gnu.org
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Sun, 26 May 2024 18:57:55 +0300
On 24/05/2024 20:44, Manuel Giraud wrote:
> Thanks.  Here is what I get first without and then with the patch after
> cache warming on both runs:
> 
> ...
> Grep finished with no matches found at Fri May 24 19:04:54, duration
> 10.2 s
> 
> ...
> Grep finished with no matches found at Fri May 24 19:37:37, duration
> 9.01 s

That looks like a slight improvement as well, nice.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Thu, 30 May 2024 12:30:02 GMT) Full text and rfc822 format available.

Message #74 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Thu, 30 May 2024 08:29:04 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
>> Date: Wed, 22 May 2024 08:54:25 -0400
>> 
>> > In any case, please modify the patch so that 'exec-plus' is still
>> > preferred on MS-Windows (because most Windows ports of xargs are IME
>> > abysmally buggy, so better avoided as much as possible).
>> >
>> > A comment there with the justification of the order will also be
>> > appreciated.
>> 
>> Done, attached.
>
> Thanks, LGTM.

So is this OK to install now?  We have had benchmarking on several
platforms and they all report either a performance improvement or no
change.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Thu, 30 May 2024 14:54:02 GMT) Full text and rfc822 format available.

Message #77 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Thu, 30 May 2024 17:52:25 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
> Date: Thu, 30 May 2024 08:29:04 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> From: Spencer Baugh <sbaugh <at> janestreet.com>
> >> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
> >> Date: Wed, 22 May 2024 08:54:25 -0400
> >> 
> >> > In any case, please modify the patch so that 'exec-plus' is still
> >> > preferred on MS-Windows (because most Windows ports of xargs are IME
> >> > abysmally buggy, so better avoided as much as possible).
> >> >
> >> > A comment there with the justification of the order will also be
> >> > appreciated.
> >> 
> >> Done, attached.
> >
> > Thanks, LGTM.
> 
> So is this OK to install now?

As far as I'm concerned, yes.  I don't know if the other participants
of this discussion are okay with this or they still have some
comments.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Fri, 28 Jun 2024 14:04:01 GMT) Full text and rfc822 format available.

Message #80 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Fri, 28 Jun 2024 10:03:01 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
>> Date: Thu, 30 May 2024 08:29:04 -0400
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> >> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> >> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
>> >> Date: Wed, 22 May 2024 08:54:25 -0400
>> >> 
>> >> > In any case, please modify the patch so that 'exec-plus' is still
>> >> > preferred on MS-Windows (because most Windows ports of xargs are IME
>> >> > abysmally buggy, so better avoided as much as possible).
>> >> >
>> >> > A comment there with the justification of the order will also be
>> >> > appreciated.
>> >> 
>> >> Done, attached.
>> >
>> > Thanks, LGTM.
>> 
>> So is this OK to install now?
>
> As far as I'm concerned, yes.  I don't know if the other participants
> of this discussion are okay with this or they still have some
> comments.

Since no-one else has made any comments, could this be installed now?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Sun, 30 Jun 2024 05:09:01 GMT) Full text and rfc822 format available.

Message #83 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Sat, 29 Jun 2024 22:07:30 -0700
Spencer Baugh <sbaugh <at> janestreet.com> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>>> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
>>> Date: Thu, 30 May 2024 08:29:04 -0400
>>>
>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>
>>> > Thanks, LGTM.
>>>
>>> So is this OK to install now?
>>
>> As far as I'm concerned, yes.  I don't know if the other participants
>> of this discussion are okay with this or they still have some
>> comments.
>
> Since no-one else has made any comments, could this be installed now?

Yes, please go ahead, and thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#71094; Package emacs. (Wed, 03 Jul 2024 12:54:02 GMT) Full text and rfc822 format available.

Message #86 received at 71094 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>, 71094 <at> debbugs.gnu.org,
 dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 03 Jul 2024 08:53:45 -0400
Stefan Kangas <stefankangas <at> gmail.com> writes:
> Spencer Baugh <sbaugh <at> janestreet.com> writes:
>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>>>> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
>>>> Date: Thu, 30 May 2024 08:29:04 -0400
>>>>
>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>
>>>> > Thanks, LGTM.
>>>>
>>>> So is this OK to install now?
>>>
>>> As far as I'm concerned, yes.  I don't know if the other participants
>>> of this discussion are okay with this or they still have some
>>> comments.
>>
>> Since no-one else has made any comments, could this be installed now?
>
> Yes, please go ahead, and thanks.

I don't actually have commit access (AFAIK?) so someone else will need
to push it :)




Reply sent to Andrea Corallo <acorallo <at> gnu.org>:
You have taken responsibility. (Wed, 03 Jul 2024 13:46:02 GMT) Full text and rfc822 format available.

Notification sent to Spencer Baugh <sbaugh <at> janestreet.com>:
bug acknowledged by developer. (Wed, 03 Jul 2024 13:46:02 GMT) Full text and rfc822 format available.

Message #91 received at 71094-done <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <acorallo <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>, 71094-done <at> debbugs.gnu.org,
 Stefan Kangas <stefankangas <at> gmail.com>, dmitry <at> gutov.dev
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
Date: Wed, 03 Jul 2024 09:42:52 -0400
Spencer Baugh <sbaugh <at> janestreet.com> writes:

> Stefan Kangas <stefankangas <at> gmail.com> writes:
>> Spencer Baugh <sbaugh <at> janestreet.com> writes:
>>
>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>>>>> Cc: rgm <at> gnu.org,  71094 <at> debbugs.gnu.org,  dmitry <at> gutov.dev
>>>>> Date: Thu, 30 May 2024 08:29:04 -0400
>>>>>
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>>
>>>>> > Thanks, LGTM.
>>>>>
>>>>> So is this OK to install now?
>>>>
>>>> As far as I'm concerned, yes.  I don't know if the other participants
>>>> of this discussion are okay with this or they still have some
>>>> comments.
>>>
>>> Since no-one else has made any comments, could this be installed now?
>>
>> Yes, please go ahead, and thanks.
>
> I don't actually have commit access (AFAIK?) so someone else will need
> to push it :)

Should be done, please double check.

Closing meanwhile.

  Andrea




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 01 Aug 2024 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 199 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.