GNU bug report logs - #71094
[PATCH] Prefer to run find and grep in parallel in rgrep

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Spencer Baugh <sbaugh@HIDDEN>; Keywords: patch; dated Tue, 21 May 2024 14:36:01 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 18:30:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 14:30:45 2024
Received: from localhost ([127.0.0.1]:57503 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9qjd-00081E-Bb
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 14:30:45 -0400
Received: from eggs.gnu.org ([209.51.188.92]:41410)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1s9qjb-0007nh-T0
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 14:30:44 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1s9qjP-0004IO-CY; Wed, 22 May 2024 14:30:32 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=JmKZmda7uGbD7hK8vjUUeFudsSpmwHTtNiMvVc/eFnA=; b=eGjkd6/SvCWe
 uRa0BmavLnlwXGSB7EYkxphGMeumQ5NLGqcmzq7UQXRjVHByhiv+Y7r2PtG3ys28DvGWqhQYUCtVD
 EmkQ6aBvgZYMb3W2+snR+4XDv8vpsRMy80fO+72EPCgMNynguglgGQDKcWnKk9u1I6i9hVL68SY0P
 AkylSgqfCtvnOeJVVSCq6m1dG+685NK3dpu3vdQ7Uax1yXfP6NVikkxDrRBMY+mLyDUopT3JslqTK
 68kTI0p3Zq60e98o62WSPUxwQVW1SYxqM7EvrsLJxzhdctv16wQPjdqKgkogqYP5xGnulASfN8EpF
 i24rHGUrEDZXo5lQCBlyaw==;
Date: Wed, 22 May 2024 21:30:27 +0300
Message-Id: <86r0dt66nw.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Manuel Giraud <manuel@HIDDEN>
In-Reply-To: <87pltdbu17.fsf@HIDDEN> (message from Manuel Giraud on
 Wed, 22 May 2024 20:06:44 +0200)
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN>
 <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN>
 <861q5t7vrp.fsf@HIDDEN>
 <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN>
 <86wmnl6f62.fsf@HIDDEN> <87pltdbu17.fsf@HIDDEN>
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: 71094
Cc: dmitry@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN, sbaugh@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

> From: Manuel Giraud <manuel@HIDDEN>
> Cc: Dmitry Gutov <dmitry@HIDDEN>,  sbaugh@HIDDEN,
>   71094 <at> debbugs.gnu.org,  rgm@HIDDEN
> Date: Wed, 22 May 2024 20:06:44 +0200
> 
> Eli Zaretskii <eliz@HIDDEN> writes:
> 
> [...]
> 
> >> >> So here is another example: a Linux kernel checkout (76K files). Also
> >> >> about 30% improvement: 1.40s vs 2.00s.
> >> > 
> >> > This is all highly system-dependent.
> >> 
> >> Naturally. So it'd be great to see some additional data points from 
> >> users on other systems.
> >> 
> >> Especially those where the default limit is lower than it is on mine.
> >
> > I'd be happy if someone could time these methods on MS-Windows and on
> > some *BSD system, at least.  Bonus points for macOS.
> 
> I'm not sure it is what you asked for but here is some numbers on
> OpenBSD (native 'find' and 'xargs'):
> 
> $ time find ~/emacs-repo -type f -exec grep foo {} + > /dev/null
>     0m04.09s real     0m03.29s user     0m00.74s system
> $ time find ~/emacs-repo -type f -print0 | xargs -0 grep foo > /dev/null
>     0m04.10s real     0m03.45s user     0m00.66s system    
> 
> $ find /usr/src -type f | wc -l
>   114315
> $ time find /usr/src -type f -exec grep foo {} + > /dev/null
>     0m14.07s real     0m07.68s user     0m06.29s system
> $ time find /usr/src -type f -print0 | xargs -0 grep foo > /dev/null
>     0m13.83s real     0m07.94s user     0m06.25s system

Thanks, but we need the timings of the corresponding Emacs commands,
not the commands run from the shell prompt.

Btw, are you sure that xargs or grep don't pay attention to the fact
that their output is redirected to the null device, and do nothing?
Some variants of these commands are known to use such a trick, AFAIR.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 18:23:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 14:23:46 2024
Received: from localhost ([127.0.0.1]:57466 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9qcs-0005z4-3X
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 14:23:46 -0400
Received: from eggs.gnu.org ([209.51.188.92]:50594)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1s9qcp-0005yy-Jm
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 14:23:44 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1s9qaX-00039E-71; Wed, 22 May 2024 14:21:21 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=o+8exqQTyM2iNs0fgypPLxCR7sxg0DpIH/1bcyfQdQk=; b=cC7RRwzv7JYB
 I3o2+8nFlGjL/j28+YNDVlHInwBViJ7Cc92LmYktCoYuIWnYrckELA4HqERH5e/Nv5Xmz7chQDhJo
 IwpXePIxj5anP4BsXakXDEJcVnqM+jNKDAiUH+amI8cJiwM9MjFLuTsYljrpB5BfdohXZNnpkCQpQ
 FJ3HW8uZaZJt1JA7/9hh3YhSqSOx0oxug2yKa1gqGlwz53fdXhJyyDNiiKDjwKtRq8Ee7DHwnkGAd
 laJUNrLIyFOIPajJ8maDIgcx0T+i6UiLiywMP1bAGjyhi/mFXxYP1ITFv35D2tXRMgXR9VjNCKjjz
 6ulZ2OI1pNyr0A5phRas+A==;
Date: Wed, 22 May 2024 21:21:14 +0300
Message-Id: <86ttip6739.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <73b2c595-9200-4381-ae0f-2c3e1a2b1f29@HIDDEN> (message from
 Dmitry Gutov on Wed, 22 May 2024 20:47:57 +0300)
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN> <861q5t7vrp.fsf@HIDDEN>
 <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN> <86wmnl6f62.fsf@HIDDEN>
 <73b2c595-9200-4381-ae0f-2c3e1a2b1f29@HIDDEN>
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

> Date: Wed, 22 May 2024 20:47:57 +0300
> Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
> From: Dmitry Gutov <dmitry@HIDDEN>
> 
> To sum up though, the patch under discussion doesn't make things worse 
> on the macOS laptop I tested.

Thanks, it's good to know.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 18:06:57 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 14:06:57 2024
Received: from localhost ([127.0.0.1]:57379 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9qMb-0005q2-1H
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 14:06:57 -0400
Received: from ledu-giraud.fr ([51.159.28.247]:25875)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <manuel@HIDDEN>) id 1s9qMY-0005pw-KS
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 14:06:55 -0400
DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=dCvbjo0R
 1QQRke/vf3socaM5EglOSUwjbG0I7JLV2+U=;
 h=date:references:in-reply-to:
 subject:cc:to:from; d=ledu-giraud.fr; b=2wTFtui+RVzJAbxFcuD2mRmZBpiOZe
 zrmq6a1TJoiiGB0nc+hHxtdbjoj0MXKdvxESXzq/D8JQqC35YICNShBg==
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=dCvbjo0R1QQRke/v
 f3socaM5EglOSUwjbG0I7JLV2+U=;
 h=date:references:in-reply-to:subject:
 cc:to:from; d=ledu-giraud.fr; b=1I3AUDzkl5AghOQARpyyO7KIveU1Wky5tq6FRw
 HQbSxoIf3r5FjNOEqr8ZprvAmSUb2FMNexW3CJVIsqNYEjmQ700PirseHkcj3vPoeDcUdh
 0SFYqxo4GVuiRnzYjJ/QT8j40HtDYBfjcTFMtrfHZRJGYXipcjmCTSemlNejlFMNeqFJx+
 1R5JuvdoeXrI+7cHWZ/Ywy/2ldyFkyYhllSNUc7ixmqAjwXZYqeceEG507vWO31lE8W/gx
 tzhWTrJyyMHhNRNmnF5upf4a7O2ARSXnmtIuAWms+3LL5z6qzxuJfWDZcUZn/3mR6Lk4nq
 Whlr8caJ8nzEI9bv0gt+FsuQ==
Received: from computer (<unknown> [10.1.1.1])
 by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id 0aff69f9
 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); 
 Wed, 22 May 2024 20:06:46 +0200 (CEST)
From: Manuel Giraud <manuel@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
In-Reply-To: <86wmnl6f62.fsf@HIDDEN> (Eli Zaretskii's message of "Wed, 22 May
 2024 18:26:45 +0300")
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN>
 <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN>
 <861q5t7vrp.fsf@HIDDEN>
 <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN>
 <86wmnl6f62.fsf@HIDDEN>
Date: Wed, 22 May 2024 20:06:44 +0200
Message-ID: <87pltdbu17.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 71094
Cc: Dmitry Gutov <dmitry@HIDDEN>, 71094 <at> debbugs.gnu.org, rgm@HIDDEN,
 sbaugh@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Eli Zaretskii <eliz@HIDDEN> writes:

[...]

>> >> So here is another example: a Linux kernel checkout (76K files). Also
>> >> about 30% improvement: 1.40s vs 2.00s.
>> > 
>> > This is all highly system-dependent.
>> 
>> Naturally. So it'd be great to see some additional data points from 
>> users on other systems.
>> 
>> Especially those where the default limit is lower than it is on mine.
>
> I'd be happy if someone could time these methods on MS-Windows and on
> some *BSD system, at least.  Bonus points for macOS.

I'm not sure it is what you asked for but here is some numbers on
OpenBSD (native 'find' and 'xargs'):

$ time find ~/emacs-repo -type f -exec grep foo {} + > /dev/null
    0m04.09s real     0m03.29s user     0m00.74s system
$ time find ~/emacs-repo -type f -print0 | xargs -0 grep foo > /dev/null
    0m04.10s real     0m03.45s user     0m00.66s system    

$ find /usr/src -type f | wc -l
  114315
$ time find /usr/src -type f -exec grep foo {} + > /dev/null
    0m14.07s real     0m07.68s user     0m06.29s system
$ time find /usr/src -type f -print0 | xargs -0 grep foo > /dev/null
    0m13.83s real     0m07.94s user     0m06.25s system
-- 
Manuel Giraud




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 17:48:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 13:48:17 2024
Received: from localhost ([127.0.0.1]:57284 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9q4X-0005fl-4S
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 13:48:17 -0400
Received: from fhigh1-smtp.messagingengine.com ([103.168.172.152]:34791)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dmitry@HIDDEN>) id 1s9q4S-0005ff-Dy
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 13:48:15 -0400
Received: from compute7.internal (compute7.nyi.internal [10.202.2.48])
 by mailfhigh.nyi.internal (Postfix) with ESMTP id 2371311401BC;
 Wed, 22 May 2024 13:48:01 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute7.internal (MEProxy); Wed, 22 May 2024 13:48:01 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1716400081;
 x=1716486481; bh=oDEme5OS+jkJFZ7CkqHrd0XCZX3vDEP+eml7ZQsxZRw=; b=
 LnI4DrnF91Hg/Bet3+kxiODVSc+pucF2AOXQGHvrDR/dmAIcnOeSgNcpZhWVmjGU
 s2vyiiueXGvzwkBqmuhcs09h6ZgtzmeCstVl7fPtXnFMu+U9lWu4WjqKYl1diPfO
 IE7hrl9YBuWKzTptl7veUdQTa/CGg5wHILynveQDOAdDj8+jqG9g4zCb2JTCn6bg
 d9fjMrqkGR07vsf07o6mtOFTWqUYF9y8mrtSN5wjmGAWwgSnjoTa9BpaPFKcaQKg
 4TKHRv11cnowqWFqaLhf/aRDSVZ+ci2+ZHbWUJ0I0MGwP+i6AEyYR6nYNAoyKku5
 +v+ohoWqyWNwHvEHxPj15Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716400081; x=
 1716486481; bh=oDEme5OS+jkJFZ7CkqHrd0XCZX3vDEP+eml7ZQsxZRw=; b=Q
 P8t7FTttwCTozcFm2DnrWHC/Z6qZWkSdUWm72AUvOLlNh1Utd2OC3HdJRjryzhb6
 FnyRvHhwW15XLixieQ21QcgYtD4G1fCtUwJCfJwF7pWQc7cgHB8qYS9TJKFJPqPM
 jI/vsqvC4v2cFl00CmTfPXEFv6wlydxF5IbeBAIMY2yJbuwxDI9e57mdnRGNSp+O
 +YQm/HqZzSb8n55AuBJUDHNeNM76NQbarfzN4TI566mVnJyRxXfdr1/5FlfB/ZYp
 0Kcer1R3gv+n1BIJNXbQ5sPqLhJLZrYkrczb7rUT1JbTGwYP2NbkH285Y/PZk2zQ
 KsCXM3hqIGgs3NGKMihdg==
X-ME-Sender: <xms:0C9OZtewoEAu47yx1f1-YEESce8G2VE3yeUlgNHBU2LtdTD-50_iqw>
 <xme:0C9OZrORdqPfH3yjD2O_NHfMZS4eEDaysR_bOLShK6mxwtQRvKjrS6bg6caKIn4KV
 rkdyaHDAqEvwXkN4to>
X-ME-Received: <xmr:0C9OZmgzx6LfFn7P1ztzLcHRTwyVZLtyRbsh_59LI2CM-_FovqGJNhsUd5Amsu1fn6dX>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeigedgfeejucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi
 thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth
 htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu
 jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug
 hmihhtrhihsehguhhtohhvrdguvghv
X-ME-Proxy: <xmx:0C9OZm-_wx9s4CliynX8eyl3OkBF4tbKat0zkUrfONy-e3k2qZ2ffw>
 <xmx:0C9OZpsA7lyotAG3g0hw2tmvjrCMBs3glZH-XLLKMItnCDyEL9WSOg>
 <xmx:0C9OZlEgrN_MWX4l8zj-VTUdujtuxdOkp5pg0INwci9jVJcwcR0iTA>
 <xmx:0C9OZgManFyPYIb4H9_FVugWjAg-uv-hQTJ8MoftSL-5UmIq8AIBzA>
 <xmx:0S9OZjLGp9hexOIGQVepc42ek7eMCFc9tiSfgpCbx6KWjS51uyDaNNKL>
Feedback-ID: i0e71465a:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed,
 22 May 2024 13:47:59 -0400 (EDT)
Message-ID: <73b2c595-9200-4381-ae0f-2c3e1a2b1f29@HIDDEN>
Date: Wed, 22 May 2024 20:47:57 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
To: Eli Zaretskii <eliz@HIDDEN>
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN> <861q5t7vrp.fsf@HIDDEN>
 <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN> <86wmnl6f62.fsf@HIDDEN>
Content-Language: en-US
From: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <86wmnl6f62.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

On 22/05/2024 18:26, Eli Zaretskii wrote:

> I'd be happy if someone could time these methods on MS-Windows and on
> some *BSD system, at least.  Bonus points for macOS.

As luck would have it, I have an M3 Pro macOS laptop around.

The situation with it is odd, as usual. First of all, the default 
find/xargs/grep installed are some very slow versions from Apple.

The patch doesn't seem to change the performance of the search using 
them, it's just slow either way.

Things get better if I install the GNU versions from Homebrew and

    (setq grep-program "ggrep")

at startup. Performance gets better by 4x or so just from that, but 
still not to the level of my 5-year-old GNU/Linux laptop. The patch 
doesn't seem to have a make a difference still. If I also set

    (setq xargs-program "gxargs")

then the patch starts improving performance in a large directory (again: 
Linux kernel), by around 10%. Still more than 3x slower than on my older 
laptop with Linux. No idea why - the ggrep, gxargs and gfind executables 
are all reported to be arm64, so I can't blame the x64->arm64 
translation layer.

To sum up though, the patch under discussion doesn't make things worse 
on the macOS laptop I tested.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 15:27:09 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 11:27:08 2024
Received: from localhost ([127.0.0.1]:56658 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9nrw-0004NJ-H1
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 11:27:08 -0400
Received: from eggs.gnu.org ([209.51.188.92]:38428)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1s9nrs-0004Mt-8O
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 11:27:07 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1s9nrf-0005hz-Cx; Wed, 22 May 2024 11:26:51 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=7oSdqFsEHKK47UwzpdrnEjie/EJlJ+UT7R0PuVsBCys=; b=se02QEli8gQS
 10b1wzWpmAFb6eWPZzFlwnZIFdk5a802hIl7HNV8/Jc/CTZ3h0BBHLybhrkajStfCvYjuLo7SP9jW
 qABEpWKH19GT2Kw6VFLvGdQSq9ZJRzKOqp176XDUZ1F/ClTdqlW/5S6dFSllaz78fWqc6NfPx27LV
 EZQ8qhhdt6f/O7OdCydhL87viqDyHn68n2zDU3QQ06AeeULyXFJg9BSy32jOGpeKKng9UdIUzG5xd
 bD08O5OQAyF8FlO1ED+cOxkDder1kK8S8x01plblhAL+WY3hysVYlpoyfLPi0XKMQC1aeErAO++Rc
 e1cuUplU9e9mJ9ZBA1hsfA==;
Date: Wed, 22 May 2024 18:26:45 +0300
Message-Id: <86wmnl6f62.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN> (message from
 Dmitry Gutov on Wed, 22 May 2024 17:50:42 +0300)
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN> <861q5t7vrp.fsf@HIDDEN>
 <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN>
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

> Date: Wed, 22 May 2024 17:50:42 +0300
> Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
> From: Dmitry Gutov <dmitry@HIDDEN>
> 
> >> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
> >> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
> >> parallelism there we'll need to either lower the limit or test it in a
> >> project at least twice as big.
> > 
> > ...until xargs collects all those characters, it will not invoke grep,
> > right?  So, for directories whose file names total less than those
> > 200K, xargs will still wait until find ends its job, right?
> 
> That's right. And it's why we're not seeing much of a difference in 
> projects of Emacs's size or smaller. No apparent regression either, though.

But we added xargs to the soup.  On GNU/Linux, where GNU Findutils are
developed, it probably isn't a problem.  On other systems, not
necessarily...

> >> So here is another example: a Linux kernel checkout (76K files). Also
> >> about 30% improvement: 1.40s vs 2.00s.
> > 
> > This is all highly system-dependent.
> 
> Naturally. So it'd be great to see some additional data points from 
> users on other systems.
> 
> Especially those where the default limit is lower than it is on mine.

I'd be happy if someone could time these methods on MS-Windows and on
some *BSD system, at least.  Bonus points for macOS.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 14:50:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 10:50:58 2024
Received: from localhost ([127.0.0.1]:56387 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9nIw-0003r7-J2
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 10:50:58 -0400
Received: from fhigh7-smtp.messagingengine.com ([103.168.172.158]:59819)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dmitry@HIDDEN>) id 1s9nIu-0003r1-GG
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 10:50:57 -0400
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailfhigh.nyi.internal (Postfix) with ESMTP id 781C71140192;
 Wed, 22 May 2024 10:50:45 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute2.internal (MEProxy); Wed, 22 May 2024 10:50:45 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1716389445;
 x=1716475845; bh=s1Dij0eugnhR43c2DtGd+4b3xenJ9eFS7s0V/Iyg9vQ=; b=
 jMqRWJ+Gg/vuy172V73snJBe2J5MB0vlpUrPptqblT7/apeHVGo/ZbgvAhpNn7wm
 JI44c0WpqLrb2zvtnDIidUGTSBu3iw78ns6tr3dDm2+JGdmSnXmtq6/av/fj+fEl
 Ztm9Msx7awwSpqxPbmeZU5XBea0+ShE6QCMetprxK1wATNZ3klYDu7xW/20UJBum
 ffOVtysxhfBBSNTeukWUpfdQhM/9zXRhYP52D8ZYZAbdo9r634mNnptN4gf9NtzE
 IhhUWa32EO06lyLbS5VNRVVoU0Ke7b3t3yDIf2sFowF/8I7l15VzlL8eBWJcpSXH
 qIaGXjPCDXQibzkyxsi1oQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716389445; x=
 1716475845; bh=s1Dij0eugnhR43c2DtGd+4b3xenJ9eFS7s0V/Iyg9vQ=; b=H
 EvgsA23ZV0woozK8dvBp3Ve3Umau0nlYHI6pBuwt3YW35Ol3OjrDf49Uuy/fKCGo
 ea/8uElj+083MJEDvKMqrHPxyHRamhP289yvm9SafcB8cAHrF0AzaKjo4xKg5SLR
 rz4fSzWTCQy2ehdGvd6vpFNUFX4doAqeL7nir9u+NqOwuPv7jmJOlLWCTHFfQupI
 TxjkM5tXZZUGxidOfcknUMjCn0eQfzhMrbGmbZsgIiEE1WnVGcgb+OszczTHPRen
 CR6js47gpEkzxL9j7fZp6+7AqbcFkWEAj0GMNhqIdgIMqODY0Ra6QKhgM5EgywpM
 MK49AkvAeZb1tYdSweNmQ==
X-ME-Sender: <xms:RQZOZrs2054uDgYirBkr57WdVH4weTJ0qDpkI0R9Mf1tx2u9FpS-WQ>
 <xme:RQZOZsdrdEkbIXfkyR0xLxSMqsa4kjKLrjTOkKHPYgPu1uld3p2o4AmfAWMRqc7La
 opiZ7QHPt5sePMgPKU>
X-ME-Received: <xmr:RQZOZuwFO8TAgGbWFLbXA54uoL8q-6_BwBvPEJHBcddWn1__EB5ED-sbLigw0jAHDeVv>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeigedgvddtucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi
 thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth
 htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu
 jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug
 hmihhtrhihsehguhhtohhvrdguvghv
X-ME-Proxy: <xmx:RQZOZqP55vbTE2dIrrz1oGjWenOZODt9sXrYZKnux2-yXCtCzhPdwA>
 <xmx:RQZOZr-AKF1SxJiFndT3hyk25Zxj_OLkxHUb3d8-DM5D6zm9xQ3cxA>
 <xmx:RQZOZqUH5eAv7-MlAlkoDCk66h_1b_VDv5P-3RJfhHSK5JP1iXh7Lg>
 <xmx:RQZOZscQHBupjRDN7fW58HStI6D0T_6x2zrGGYqRR5bbxuuUjqnEUg>
 <xmx:RQZOZnZRvkdPmDqZ_gSagwsK4wcczHfPjhSKlNNjhuHujgy7FtTCkeoI>
Feedback-ID: i0e71465a:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed,
 22 May 2024 10:50:44 -0400 (EDT)
Message-ID: <10f62497-dfb1-4c46-b18a-6d1100de4b6a@HIDDEN>
Date: Wed, 22 May 2024 17:50:42 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
To: Eli Zaretskii <eliz@HIDDEN>
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN> <861q5t7vrp.fsf@HIDDEN>
Content-Language: en-US
From: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <861q5t7vrp.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

On 22/05/2024 17:42, Eli Zaretskii wrote:
>>> That's true, but what is your mental model of how the pipe with xargs
>>> works in practice?  How many invocations of grep will xargs do, and
>>> when will the first invocation happen?
>>
>> In my mental model xargs acts like an asynchronous queue with batch
>> processing. The first invocation will happen after the output reaches
>> the maximum line number of maximum number of arguments configured. They
>> are system-dependent by default.
> 
> And can be rather small.  But if it is large, then...
> 
>> For example, on my system 'xargs --show-limits' says
>>
>>     Size of command buffer we are actually using: 131072
>>
>> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
>> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
>> parallelism there we'll need to either lower the limit or test it in a
>> project at least twice as big.
> 
> ...until xargs collects all those characters, it will not invoke grep,
> right?  So, for directories whose file names total less than those
> 200K, xargs will still wait until find ends its job, right?

That's right. And it's why we're not seeing much of a difference in 
projects of Emacs's size or smaller. No apparent regression either, though.

>> So here is another example: a Linux kernel checkout (76K files). Also
>> about 30% improvement: 1.40s vs 2.00s.
> 
> This is all highly system-dependent.

Naturally. So it'd be great to see some additional data points from 
users on other systems.

Especially those where the default limit is lower than it is on mine.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 14:43:10 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 10:43:10 2024
Received: from localhost ([127.0.0.1]:56340 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9nBO-0003lr-42
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 10:43:10 -0400
Received: from eggs.gnu.org ([209.51.188.92]:44050)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1s9nBJ-0003lP-Qk
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 10:43:08 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1s9nB7-0005XR-UK; Wed, 22 May 2024 10:42:53 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=VQyvt4wMYYJdAwlXbko7jtJQ2pI/ZWIGMdHXPQKLsfc=; b=OXg7GMzAqX5v
 /VUhZyhGxL69ffPj5YCymVBeKOuKs2DuMBbomobLc8sy95Ue/Vm2YsKJfSqgg9imqEjOU67u74fQA
 zP7FH6WzfdViwMDz/l0OP+JfadjIYeiTjyUztWL/ZdBClqfc6lVO/+1Exq8e+68rM1E3i5WZDs1k6
 uCp2DnlRBGmIsyRDLZU8xO9ZQEc7mW1ofdunXSIb+nZ6aKjZtQPRQnuzsBo2LMNfCqnGcusJAJPA3
 oA3pn5s7MOyqBgh6p2Rd8J1P0cpOKXVnfJqf2Ea5gvNxNazF4yQOi6fdIJuL0nwz+ubusMjhmg3J4
 l6/WnVHKogbqK4ObSUO0uQ==;
Date: Wed, 22 May 2024 17:42:50 +0300
Message-Id: <861q5t7vrp.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN> (message from
 Dmitry Gutov on Wed, 22 May 2024 17:22:56 +0300)
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> <868r026jlq.fsf@HIDDEN>
 <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN>
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

> Date: Wed, 22 May 2024 17:22:56 +0300
> Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
> From: Dmitry Gutov <dmitry@HIDDEN>
> 
> >> The directory where I saw significant improvement has 300K files.
> > 
> > That's what I thought.  So we are changing the decade-old defaults to
> > favor huge directories, which is not necessarily the wisest thing to
> > do.
> 
> I don't see any regression on small directories, though. And an 
> improvement on big ones.

On your system.

> > That's true, but what is your mental model of how the pipe with xargs
> > works in practice?  How many invocations of grep will xargs do, and
> > when will the first invocation happen?
> 
> In my mental model xargs acts like an asynchronous queue with batch 
> processing. The first invocation will happen after the output reaches 
> the maximum line number of maximum number of arguments configured. They 
> are system-dependent by default.

And can be rather small.  But if it is large, then...

> For example, on my system 'xargs --show-limits' says
> 
>    Size of command buffer we are actually using: 131072
> 
> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928 
> characters. Meaning, it uses just 1.5 'grep' invocations. To see better 
> parallelism there we'll need to either lower the limit or test it in a 
> project at least twice as big.

...until xargs collects all those characters, it will not invoke grep,
right?  So, for directories whose file names total less than those
200K, xargs will still wait until find ends its job, right?

> So here is another example: a Linux kernel checkout (76K files). Also 
> about 30% improvement: 1.40s vs 2.00s.

This is all highly system-dependent.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 14:23:15 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 10:23:15 2024
Received: from localhost ([127.0.0.1]:56246 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9ms7-0003WD-Do
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 10:23:15 -0400
Received: from fout1-smtp.messagingengine.com ([103.168.172.144]:32869)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dmitry@HIDDEN>) id 1s9ms3-0003Vm-Jy
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 10:23:13 -0400
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailfout.nyi.internal (Postfix) with ESMTP id 760EA13800B4;
 Wed, 22 May 2024 10:23:00 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Wed, 22 May 2024 10:23:00 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1716387780;
 x=1716474180; bh=faNf1GI7kINA0SayIDDBlOuXTOKWfK6IsDRbI/Qp0Oc=; b=
 HQ3B++hqwJJcb/1WqB22BnMUBgspMsypNooMkY0L6CreeKQu7Zxf1FKb1TDZSKnO
 hZeh/R/ZcOdrb427k7jz5ZSASUm217+e9DQYcw/6Rg/zAr3bK408l7HVEvaNbMDA
 3EDuUHHllZJjpqESYvbssvICk1WMwQMY2SREvimU1nxPYL/h6LUVygcbJ8OKG+71
 jpQd9K/2fmkigL4V/jXfsCWtxa6f6r3WTOJI8yXvB8BEnAvpH7IEbdcX0LxUp6Qv
 OaxirYrbScL3Jc86HcJdJ15G2D6n+v9BeYaH+UZukJcZj/VUWKHhdIzh2BvC2vva
 3VhTq8fdhb+Mb4djTw+ByQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716387780; x=
 1716474180; bh=faNf1GI7kINA0SayIDDBlOuXTOKWfK6IsDRbI/Qp0Oc=; b=I
 zS1P4tjBo6c9NRBsmTw6gSgYrXRmFPEY914x1ZyRlLtMgMF4Ey0/srM3gd1gTUUm
 iGT8T12Omqgf7Yo3FHEN0ZpoACKJl4ktfCBa/bkBHdynmVyMGgYCBQUtuNoyIep+
 z5ZKBnOf0mXLsqY7kvPxoM0kNNV0s6vzUrH1SkBm5bx0v35FlNXtj8ahPCBR4sOW
 beuQ+EmqBdDJ5fg5rmi8RBBDsA4k46yrc4rAnfte1CWjWAJSMzRsI3ToffiH1s7p
 wsaAIVO44H9LUIkoi1PXpHuNv44Ej8DHyjaCmx1k/PfExlCTqcLws75F5xQVeuVy
 5COgNnb2K1nE63EXrJ3BQ==
X-ME-Sender: <xms:w_9NZgVuG4izWmJXEXPgtpD6ukY9-CRkizOeSWn3CIt5mfxOAIgm2g>
 <xme:w_9NZklyGKP2A3G0Vk3FV4uDLlnF8FX-E4B5nQ7J1UhVmIZtS4b0O31aQ7zAZOvgy
 aHUm1IGnFhX7zzPWYE>
X-ME-Received: <xmr:w_9NZkaXwgh1XUVHZ4HRAH3eFmxCS6IAzdIq4H-fCQvxp-R6GOs38O-pyH1zcchksYoR>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeigedgudekucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi
 thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth
 htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu
 jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug
 hmihhtrhihsehguhhtohhvrdguvghv
X-ME-Proxy: <xmx:w_9NZvW4LvyyDd5BC2QxvQ9iCEDQOKmGfsVRUqq8-TZt8P1CyXTRWA>
 <xmx:xP9NZqmiMI7I3NZk2473rKdMu8Fr1YsaUoFv0aQLsWTuAm63UprAtw>
 <xmx:xP9NZke0iz8R1BnNrelrv5R_SYy8S8y42SYkB0BFEafdpI-FGOPi1Q>
 <xmx:xP9NZsEdhcghrag57oRlzXFE2207VY_qGtZDStWiPKC3Vm2ORKiiAw>
 <xmx:xP9NZrBDegf_N7sGxiH3n6Fy9qb4Du1QuDR8FjArh9cXTshzgi-Qkvdr>
Feedback-ID: i0e71465a:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed,
 22 May 2024 10:22:58 -0400 (EDT)
Message-ID: <dc79bcff-a5db-45c7-97f5-352d569617d0@HIDDEN>
Date: Wed, 22 May 2024 17:22:56 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
To: Eli Zaretskii <eliz@HIDDEN>
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> <868r026jlq.fsf@HIDDEN>
Content-Language: en-US
From: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <868r026jlq.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

On 22/05/2024 16:50, Eli Zaretskii wrote:
>> Date: Wed, 22 May 2024 15:34:06 +0300
>> Cc: 71094 <at> debbugs.gnu.org, rgm@HIDDEN
>> From: Dmitry Gutov <dmitry@HIDDEN>
>>
>> On 22/05/2024 14:59, Eli Zaretskii wrote:
>>
>>> With how many files did you measure the 40% speedup?  Can you show the
>>> performance with much fewer and much more files than what you used?
>>
>> FWIW my test indicated that for a smaller project (such as Emacs) the
>> difference is fairly small - the new code is slightly better or the same.
>>
>> The directory where I saw significant improvement has 300K files.
> 
> That's what I thought.  So we are changing the decade-old defaults to
> favor huge directories, which is not necessarily the wisest thing to
> do.

I don't see any regression on small directories, though. And an 
improvement on big ones.

So the way I see it, we're expanding Emacs's applicability to wider 
audience without any apparent drawbacks.

It might actually give us an improvement in smaller projects as well, if 
we decrease xargs's batch size (with -s or -n). But those are fairly 
fast already, so it's not critical.

>>> I
>>> suspect that the effect depends on that.  (It also depends on the
>>> system limit on the number of files and the length of the command line
>>> that xargs can use.)  The argument about 'find' waiting is no longer
>>> relevant with 'exec-plus', since in most cases there will be just one
>>> invocation of 'grep'.
>>
>> If there's just one invocation, wouldn't that mean that it will happen
>> at the end of the full directory scan? Rather than in parallel.
> 
> That's true, but what is your mental model of how the pipe with xargs
> works in practice?  How many invocations of grep will xargs do, and
> when will the first invocation happen?

In my mental model xargs acts like an asynchronous queue with batch 
processing. The first invocation will happen after the output reaches 
the maximum line number of maximum number of arguments configured. They 
are system-dependent by default.

For example, on my system 'xargs --show-limits' says

   Size of command buffer we are actually using: 131072

Whereas in the Emacs repository "find ... -print0 | wc" reports 202928 
characters. Meaning, it uses just 1.5 'grep' invocations. To see better 
parallelism there we'll need to either lower the limit or test it in a 
project at least twice as big.

So here is another example: a Linux kernel checkout (76K files). Also 
about 30% improvement: 1.40s vs 2.00s.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 13:51:26 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 09:51:26 2024
Received: from localhost ([127.0.0.1]:56085 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9mNK-0003B6-4i
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 09:51:26 -0400
Received: from eggs.gnu.org ([209.51.188.92]:49194)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1s9mNF-0003B0-Bz
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 09:51:25 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1s9mN3-0004WR-A7; Wed, 22 May 2024 09:51:09 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=3mgLK622PPzlHzwbBWyjSaNUhrzmJ4yiWyFVL4xzynQ=; b=Urzbnn7oqSSU
 L1BEHOMsoddyjiNFZMRfW9sMteEaGptVeRJ7AKOViluH98ncpa9RaAnEfqRzNcDNB92LWSxkMLqQw
 Si/IIXl9rBv7aE0MiJF24GOyvlp3y3dIg7K4XG3Lh5jm7zE2adSn4t89go4rExcCtf3MrnVQdAM7c
 ul9244VCBd5MXVbIKhHRJk0SjZM0Qa4uVTnWoX23P21efaY9opQ0t40QlBjQftFKcJSHr1yHOHs0c
 aP1uQtrDGpU8+JvYI+bVC7WyqJ+oAABvL847twxCNEFHywE4wLENS/jlOxESlISz9cMQHWj9BELIv
 sW+w771lUVa+f4EBMVOCDQ==;
Date: Wed, 22 May 2024 16:50:57 +0300
Message-Id: <868r026jlq.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN> (message from
 Dmitry Gutov on Wed, 22 May 2024 15:34:06 +0300)
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
 <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN>
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: 71094
Cc: sbaugh@HIDDEN, 71094 <at> debbugs.gnu.org, rgm@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

> Date: Wed, 22 May 2024 15:34:06 +0300
> Cc: 71094 <at> debbugs.gnu.org, rgm@HIDDEN
> From: Dmitry Gutov <dmitry@HIDDEN>
> 
> On 22/05/2024 14:59, Eli Zaretskii wrote:
> 
> > With how many files did you measure the 40% speedup?  Can you show the
> > performance with much fewer and much more files than what you used?
> 
> FWIW my test indicated that for a smaller project (such as Emacs) the 
> difference is fairly small - the new code is slightly better or the same.
> 
> The directory where I saw significant improvement has 300K files.

That's what I thought.  So we are changing the decade-old defaults to
favor huge directories, which is not necessarily the wisest thing to
do.

> > I
> > suspect that the effect depends on that.  (It also depends on the
> > system limit on the number of files and the length of the command line
> > that xargs can use.)  The argument about 'find' waiting is no longer
> > relevant with 'exec-plus', since in most cases there will be just one
> > invocation of 'grep'.
> 
> If there's just one invocation, wouldn't that mean that it will happen 
> at the end of the full directory scan? Rather than in parallel.

That's true, but what is your mental model of how the pipe with xargs
works in practice?  How many invocations of grep will xargs do, and
when will the first invocation happen?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 12:54:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 08:54:39 2024
Received: from localhost ([127.0.0.1]:55789 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9lUM-0002Ot-Rw
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 08:54:39 -0400
Received: from mxout6.mail.janestreet.com ([64.215.233.21]:59437)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sbaugh@HIDDEN>) id 1s9lUK-0002On-Jn
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 08:54:37 -0400
From: Spencer Baugh <sbaugh@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
In-Reply-To: <86ttiq6or8.fsf@HIDDEN> (Eli Zaretskii's message of "Wed, 22 May
 2024 14:59:39 +0300")
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
Date: Wed, 22 May 2024 08:54:25 -0400
Message-ID: <ierplte9fcu.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1716382465;
 bh=McatyCaRI/rnJuGuONJuYNotmBaiGyUAowPHXu13Ti8=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=rEDEK9u2TdoAnzJ2tZb1nh+M6lMRV36ka9q3SCEPZawJuoUMZ/jn3NJpZgAhj2L5t
 ebqnG4BQurcOD0Bz0cIYjMUV7bAEC6rot6+eLg4o8zrOY7+KKzc+3Fe1nU6PMjIUH7
 zODfdKcfWL9eOPFxK8FFDcHEFw1XUIOsBZ42g8HWPP1CnN9fJx4LW/i3qaYxD3uU29
 NlMeHH/c2xpo5aW9AzaaEhsENxZHntO0uWeyijudI2lHTM+qP5y5k/HUYYzVPfwMqp
 fBqBPqSsmmgfPehiljZmeXIxw0UJMT/MT4Wh+NR/PyAQxWcW1SeGOh4idICHOuXV78
 PctE1DDET0z2w==
X-Spam-Score: 0.7 (/)
X-Debbugs-Envelope-To: 71094
Cc: rgm@HIDDEN, 71094 <at> debbugs.gnu.org, dmitry@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.3 (/)

--=-=-=
Content-Type: text/plain

Eli Zaretskii <eliz@HIDDEN> writes:

>> Cc: Glenn Morris <rgm@HIDDEN>, dmitry@HIDDEN
>> From: Spencer Baugh <sbaugh@HIDDEN>
>> Date: Tue, 21 May 2024 10:35:07 -0400
>> 
>> grep.el prefers to run "find" and "xargs grep" in a pipeline,
>> which means that "find" can continue searching the filesystem
>> while "xargs grep" searches files.  If find and xargs don't
>> support the flags required for this behavior, grep.el will fall
>> back to using the -exec flags to "find", which meant "find" will
>> wait for each "grep" process to complete before continuing to
>> search the filesystem tree.  This behavior is controlled by
>> grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
>> the slower fallback.
>> 
>> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
>> option was added for grep-find-use-xargs, which improves on
>> `exec' by running one "grep" process to search multiple files,
>> which `gnu' (by using xargs) already did.  However, the change
>> erroneously added the `exec-plus' case before the `gnu' case in
>> the autodetection code in grep-compute-defaults, so `exec-plus'
>> would be used even if `gnu' was supported.
>> 
>> This change just swaps the two cases, so the faster `gnu' option
>> is once again used in preference to `exec-plus'.  In my
>> benchmarking on a large repository, this provides a ~40%
>> speedup.
>
> With how many files did you measure the 40% speedup?

700k

> Can you show the performance with much fewer and much more files than
> what you used?

Much more is maybe hard, but much fewer is easy: with 212 files (a
subset of the original directory I searched), there's no performance
change.

> I suspect that the effect depends on that.  (It also depends on the
>system limit on the number of files and the length of the command line
>that xargs can use.)  The argument about 'find' waiting is no longer
>relevant with 'exec-plus', since in most cases there will be just one
>invocation of 'grep'.

True, it only matters when the directory tree contains more files than
can be passed to a single invocation of grep.

> In any case, please modify the patch so that 'exec-plus' is still
> preferred on MS-Windows (because most Windows ports of xargs are IME
> abysmally buggy, so better avoided as much as possible).
>
> A comment there with the justification of the order will also be
> appreciated.

Done, attached.


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline;
 filename=0001-Prefer-to-run-find-and-grep-in-parallel-in-rgrep.patch

From e7fbfe431ae1f4f004f1d92db2f3b011b30ff682 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@HIDDEN>
Date: Tue, 21 May 2024 10:32:45 -0400
Subject: [PATCH] Prefer to run find and grep in parallel in rgrep

grep.el prefers to run "find" and "xargs grep" in a pipeline,
which means that "find" can continue searching the filesystem
while "xargs grep" searches files.  If find and xargs don't
support the flags required for this behavior, grep.el will fall
back to using the -exec flags to "find", which meant "find" will
wait for each "grep" process to complete before continuing to
search the filesystem tree.  This behavior is controlled by
grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
the slower fallback.

In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
option was added for grep-find-use-xargs, which improves on
`exec' by running one "grep" process to search multiple files,
which `gnu' (by using xargs) already did.  However, the change
erroneously added the `exec-plus' case before the `gnu' case in
the autodetection code in grep-compute-defaults, so `exec-plus'
would be used even if `gnu' was supported.

This change just swaps the two cases, so the faster `gnu' option
is once again used in preference to `exec-plus'.  In my
benchmarking on a large repository, this provides a ~40%
speedup.

Also, we completely avoid running xargs on MS-Windows, because Eli
Zaretskii <eliz@HIDDEN> writes:

> most Windows ports of xargs are IME abysmally buggy, so better avoided
> as much as possible

* lisp/progmodes/grep.el (grep-compute-defaults): Prefer `gnu' for
grep-find-use-xargs over `exec-plus', but not on Windows.  (bug#71094)
---
 lisp/progmodes/grep.el | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index 0a9de04fce1..ce54c57aabc 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -812,15 +812,23 @@ grep-compute-defaults
 	(unless grep-find-use-xargs
 	  (setq grep-find-use-xargs
 		(cond
-		 ((grep-probe find-program
-			      `(nil nil nil ,(null-device) "-exec" "echo"
-				    "{}" "+"))
-		  'exec-plus)
+                 ;; For performance, we want:
+                 ;; A. Run grep on batches of files (instead of one grep per file)
+                 ;; B. If the directory is large and we need multiple batches,
+                 ;;    run find in parallel with a running grep.
+                 ;; "find | xargs grep" gives both A and B
 		 ((and
+                   (not (eq system-type 'windows-nt))
 		   (grep-probe
                     find-program `(nil nil nil ,(null-device) "-print0"))
 		   (grep-probe xargs-program '(nil nil nil "-0" "echo")))
 		  'gnu)
+                 ;; "find -exec {} +" gives A but not B
+		 ((grep-probe find-program
+			      `(nil nil nil ,(null-device) "-exec" "echo"
+				    "{}" "+"))
+		  'exec-plus)
+                 ;; "find -exec {} ;" gives neither A nor B.
 		 (t
 		  'exec))))
 	(unless grep-find-command
-- 
2.39.3


--=-=-=--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 12:34:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 08:34:25 2024
Received: from localhost ([127.0.0.1]:55694 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9lAm-0002EF-S5
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 08:34:25 -0400
Received: from fhigh6-smtp.messagingengine.com ([103.168.172.157]:45049)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dmitry@HIDDEN>) id 1s9lAk-0002E9-Da
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 08:34:23 -0400
Received: from compute6.internal (compute6.nyi.internal [10.202.2.47])
 by mailfhigh.nyi.internal (Postfix) with ESMTP id 65BCF1140186;
 Wed, 22 May 2024 08:34:10 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute6.internal (MEProxy); Wed, 22 May 2024 08:34:10 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1716381250;
 x=1716467650; bh=Bw2pw+KKsuQqZ/gh4mqwA+QH/qC0nRxk9EDXIicx97I=; b=
 rwqbZfFeVxxbaQLXDb3y84o+z9CuI/c23sVFNuT/w9hSSLmKMaLaNqLbx3CZxJ9m
 3cVY+pvY+9puEPlY5PTJyVDV5lqOPSK+rbrKPFqfoxj7Ubd0ulPYtmGy9MdRhMfC
 KDtVrZqdCULrqpdKuzGq1ws+j+6SLr0SDo9kXmgTgIDdwXzMuzNsDYP2xfvbVB25
 PwvoWfpJ0dN5eJoLpTNBHqpk4YgtK/iYy62qkearMjZhBkliH42HU/XkjMgXXw5R
 r0RbvW5eNHYP+51Sd1u5uONEGzV0iFRe+o2I9fMcgRgQeyrESMMkhFOu6zb8Rqd0
 zdhcf+jHBC+pXL83ydu/Pw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716381250; x=
 1716467650; bh=Bw2pw+KKsuQqZ/gh4mqwA+QH/qC0nRxk9EDXIicx97I=; b=m
 UOLMuTfJcITXaw4k2z+GZ9xKM8SKuimqcrFwdsasBcYtO6j0WLzSQcr1gOfsFbX8
 96eLW7VEOJyWtN/B9b5TEpayAUByA3AHOBOvrE8JSqhnvDObS/AJFPRKxXdAcASF
 he35f02EAXIyX0aR0INkmBdWCNG0hJvc9fBCpLNIRj7Mvx2hM10Pp+Ntj7KsqOKq
 F/7hEUSKYhOsiG3LFFdZKf4jVjD5ZV9/bk0xx3Jy/zrkTGLoSSos7UA8rbpkbQeo
 QP3nPghKTvvElLrMtGTSreXzVsWKaoThLa9vKGtcesSHEUItC/ATpNXptlMjGtrf
 wxdut2IzUmBPXOTbZ6zGA==
X-ME-Sender: <xms:QuZNZoXgReB5xbiQMZkB27vuRpRn8PQrHyfibscqFpzcYeiIydG9qg>
 <xme:QuZNZskyCt4mcozI7s31fVM88k2BVOPu6nXqZbMFRtb_-SgMciHjdCDXx5vWw5GzM
 Ez2ShpFTGfzD7FaCfc>
X-ME-Received: <xmr:QuZNZsaNpMouLhuWCY2ACZXkrGaHiMBeLTEUWpAJY-ekw8Hfl89f0-akjOFAE__rtNzt>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeigedguddtucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi
 thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth
 htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu
 jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug
 hmihhtrhihsehguhhtohhvrdguvghv
X-ME-Proxy: <xmx:QuZNZnWt28pDonFjXsvyo-LYJgpqYO8F-lovlCyFlIhAaLhvhBSv0w>
 <xmx:QuZNZim9f1f-WazKLUn2M-ZfyZkdt1VKUwquzWkVBBphN79qcR-AZw>
 <xmx:QuZNZsf8CHbA_IcvsaPQdlfnwFognVyTT1ehalfWVIFAEord957X0w>
 <xmx:QuZNZkH8_2D6rQrf7-H9JktfM7_pmCo_PZlMBf8Wcm77a8GbOC980g>
 <xmx:QuZNZjByAqhYUVx3QLC37p1W5Vd2TXo1JKls09Q_Lx0JY3MWRkAcHzAb>
Feedback-ID: i0e71465a:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed,
 22 May 2024 08:34:09 -0400 (EDT)
Message-ID: <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@HIDDEN>
Date: Wed, 22 May 2024 15:34:06 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
To: Eli Zaretskii <eliz@HIDDEN>, Spencer Baugh <sbaugh@HIDDEN>
References: <ierv8379qsk.fsf@HIDDEN> <86ttiq6or8.fsf@HIDDEN>
Content-Language: en-US
From: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <86ttiq6or8.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 71094
Cc: rgm@HIDDEN, 71094 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

On 22/05/2024 14:59, Eli Zaretskii wrote:

> With how many files did you measure the 40% speedup?  Can you show the
> performance with much fewer and much more files than what you used?

FWIW my test indicated that for a smaller project (such as Emacs) the 
difference is fairly small - the new code is slightly better or the same.

The directory where I saw significant improvement has 300K files.

> I
> suspect that the effect depends on that.  (It also depends on the
> system limit on the number of files and the length of the command line
> that xargs can use.)  The argument about 'find' waiting is no longer
> relevant with 'exec-plus', since in most cases there will be just one
> invocation of 'grep'.

If there's just one invocation, wouldn't that mean that it will happen 
at the end of the full directory scan? Rather than in parallel.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 22 May 2024 12:00:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed May 22 08:00:04 2024
Received: from localhost ([127.0.0.1]:55506 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9kdX-0001rJ-Lk
	for submit <at> debbugs.gnu.org; Wed, 22 May 2024 08:00:04 -0400
Received: from eggs.gnu.org ([209.51.188.92]:55314)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1s9kdV-0001qH-Ur
 for 71094 <at> debbugs.gnu.org; Wed, 22 May 2024 08:00:02 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1s9kdJ-0007ss-Mm; Wed, 22 May 2024 07:59:49 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=8g3VbakhKVWlbwBLTjtQE4Jclfp9oEUQL+NUB7e0ha8=; b=aha283UF4DBw
 SBot0gIkVHGpsuRYzgjux5noc5ZfSKoZeAGTJIbLi46gKeM20S8Z+D+xYfUt1bHrSQ+b/Fa5cFSaL
 GjznuBoaAuuxfY3P5tVFHMb9dNvl7agWRTDu8zqgE6zdcYmyn+WTBOZYhm6dKCjjq/rMfTC1OJVbI
 xw3rvsqk/lI0YgDevBY2GScmXcH3geGgfSXtJJmRCorTWsV5kWtd6AXLnOesxvtDBgfQCJXEPAcpd
 hdZlQfdTpkD/+bsAapDYolVUKSj6KcbRH9ECssyep9bt//Xe+RLgHuuNk5FZ5DunAeLL8Heo2gyKm
 XWeosvhHKXNRXwXnfZchaw==;
Date: Wed, 22 May 2024 14:59:39 +0300
Message-Id: <86ttiq6or8.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Spencer Baugh <sbaugh@HIDDEN>
In-Reply-To: <ierv8379qsk.fsf@HIDDEN> (message from Spencer Baugh on
 Tue, 21 May 2024 10:35:07 -0400)
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
References: <ierv8379qsk.fsf@HIDDEN>
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: 71094
Cc: rgm@HIDDEN, 71094 <at> debbugs.gnu.org, dmitry@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

> Cc: Glenn Morris <rgm@HIDDEN>, dmitry@HIDDEN
> From: Spencer Baugh <sbaugh@HIDDEN>
> Date: Tue, 21 May 2024 10:35:07 -0400
> 
> grep.el prefers to run "find" and "xargs grep" in a pipeline,
> which means that "find" can continue searching the filesystem
> while "xargs grep" searches files.  If find and xargs don't
> support the flags required for this behavior, grep.el will fall
> back to using the -exec flags to "find", which meant "find" will
> wait for each "grep" process to complete before continuing to
> search the filesystem tree.  This behavior is controlled by
> grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
> the slower fallback.
> 
> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
> option was added for grep-find-use-xargs, which improves on
> `exec' by running one "grep" process to search multiple files,
> which `gnu' (by using xargs) already did.  However, the change
> erroneously added the `exec-plus' case before the `gnu' case in
> the autodetection code in grep-compute-defaults, so `exec-plus'
> would be used even if `gnu' was supported.
> 
> This change just swaps the two cases, so the faster `gnu' option
> is once again used in preference to `exec-plus'.  In my
> benchmarking on a large repository, this provides a ~40%
> speedup.

With how many files did you measure the 40% speedup?  Can you show the
performance with much fewer and much more files than what you used?  I
suspect that the effect depends on that.  (It also depends on the
system limit on the number of files and the length of the command line
that xargs can use.)  The argument about 'find' waiting is no longer
relevant with 'exec-plus', since in most cases there will be just one
invocation of 'grep'.

In any case, please modify the patch so that 'exec-plus' is still
preferred on MS-Windows (because most Windows ports of xargs are IME
abysmally buggy, so better avoided as much as possible).

A comment there with the justification of the order will also be
appreciated.

Thanks.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at 71094 <at> debbugs.gnu.org:


Received: (at 71094) by debbugs.gnu.org; 21 May 2024 20:00:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue May 21 16:00:25 2024
Received: from localhost ([127.0.0.1]:50893 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9Veq-0005mI-TA
	for submit <at> debbugs.gnu.org; Tue, 21 May 2024 16:00:25 -0400
Received: from fout6-smtp.messagingengine.com ([103.168.172.149]:54189)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dmitry@HIDDEN>) id 1s9Vem-0005mC-Vd
 for 71094 <at> debbugs.gnu.org; Tue, 21 May 2024 16:00:23 -0400
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailfout.nyi.internal (Postfix) with ESMTP id 45C4F13814E6;
 Tue, 21 May 2024 16:00:10 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute4.internal (MEProxy); Tue, 21 May 2024 16:00:10 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1716321610;
 x=1716408010; bh=tSIF28CdWL7AuiO28iE4/RINlF4uSxTOgsy7RJqYHQU=; b=
 SYBaKHvEYTPkMqx/f1+qtMstQrdkGPHaeBj6VGNBmwJDoqKriCfXJJVE7VOOMNUm
 Q2MjiHmX+QUrFQSiwxeJcVEKsMQEkqq/tDyXRF3VIPxhO/pS03tJd5yVGjmv9zoG
 VaqUmIQquw7J69TvhHQ+6cGIkyX14ldWHTAiX0URd7/ob9c/eTrnV1/sVynYRaHC
 tNMJzQaEbuCsRb86zjaMJvaR5OJEtYdnHWlgn3R413ZVP5b43fJhVXEbHB4kAQxn
 HbePqLLARMPHPqSbQMXpx8wB+tohHGBgPISMzBq9CdvjAulFnuCBVZJmhTpLCZWe
 NPpsFFJT6yLRtrQbGNgt/A==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716321610; x=
 1716408010; bh=tSIF28CdWL7AuiO28iE4/RINlF4uSxTOgsy7RJqYHQU=; b=T
 kYikAzeKfVXE9yJ78/0dF2CIPCbo6oiJzbwgW0Mp59iGq/FzPlWWHOxL+I8PGxvb
 3sL3Bhd9ofuzq5iRyZvF7HVuju16Zg0/X4b5QnjMA88Fv/2/0Pzl5hbcPKCx6PD3
 kM2iKV2WozIkkgoxIlGmC3quwRIPp9YYIDkK9yrH4JfGV95G8Ri/ItOQBDvGPe7m
 DCbPGgw7BccuH2nxz9s5wh1o7nPAhV1cNdA9M3s9dJQmeCO2J+O6/85UYjWMx+In
 Kog+1PoeXdmlIfi4iDURB8O/jji2faMZIUf/1DuMzebhGEoXC5wpfjfF/r3FRASu
 /nHCOlZsJhTdx4zgcQFXw==
X-ME-Sender: <xms:Sf1MZvYK7B9xrWFMiLvIUgseBVFdzC9VDOTTaCLmLiPgde6-jYysdA>
 <xme:Sf1MZubKF2AntgZYTyxLzTVNM5TPUyCbrzTBi_7AZkPyRO7Z1Qm6H5IaOfxUOCSgD
 9VAGUK62OCvDUx76GQ>
X-ME-Received: <xmr:Sf1MZh9g1MGdwLzRtVGOIm4DBe7rq2GAxBaGz9RcbmBjeMXsiUT2mDogPqiNVy7NtYkA>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeivddgudegudcutefuodetggdotefrod
 ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh
 necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
 enucfjughrpefkffggfgfuvfevfhfhjggtgfesthejredttddvjeenucfhrhhomhepffhm
 ihhtrhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrg
 htthgvrhhnpeetudeljeegheetgfehgeejkeeuhedvveeikeeufedtvddtveefhfdvveeg
 udejheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe
 gumhhithhrhiesghhuthhovhdruggvvh
X-ME-Proxy: <xmx:Sf1MZlozQBkN87Nsl2QqVSqYPYuQvKNX2oKXTv2ZDYi5-j8MbSR60Q>
 <xmx:Sf1MZqp9tA2Wznd97LE7MuBl1rQaEN1a_zg_a0SS9Tc0gHUYAt5l8A>
 <xmx:Sf1MZrTZXsQqrpKeipE59ZwfAKy-_vjz4P5ATAbB8wYYcIgLIdgWNA>
 <xmx:Sf1MZir7hxBiBVZbbRF4fTx68Zmmo1btsPkpeaH3Z8kUEfVjmnOMNw>
 <xmx:Sv1MZtVBHWK0JjHFxzmNKP5WjXiPvLDeVT9j5AJXYwit19tYguoEo1TZ>
Feedback-ID: i0e71465a:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue,
 21 May 2024 16:00:08 -0400 (EDT)
Message-ID: <40400546-1bea-42c5-87d0-407e0b744804@HIDDEN>
Date: Tue, 21 May 2024 23:00:06 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#71094: [PATCH] Prefer to run find and grep in parallel in
 rgrep
To: Spencer Baugh <sbaugh@HIDDEN>, 71094 <at> debbugs.gnu.org
References: <ierv8379qsk.fsf@HIDDEN>
Content-Language: en-US
From: Dmitry Gutov <dmitry@HIDDEN>
In-Reply-To: <ierv8379qsk.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 71094
Cc: Glenn Morris <rgm@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Hi Spencer,

On 21/05/2024 17:35, Spencer Baugh wrote:
> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
> option was added for grep-find-use-xargs, which improves on
> `exec' by running one "grep" process to search multiple files,
> which `gnu' (by using xargs) already did.  However, the change
> erroneously added the `exec-plus' case before the `gnu' case in
> the autodetection code in grep-compute-defaults, so `exec-plus'
> would be used even if `gnu' was supported.

Perhaps the thinking was that piping data through a +1 program, with 
associated copying, should be more expensive than delegating that to 'find'.

> This change just swaps the two cases, so the faster `gnu' option
> is once again used in preference to `exec-plus'.  In my
> benchmarking on a large repository, this provides a ~40%
> speedup.

I can confirm, an improvement of ~30% here. Specifically in the "many 
files, few matches" scenario. Nice find.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 21 May 2024 14:35:23 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue May 21 10:35:23 2024
Received: from localhost ([127.0.0.1]:49465 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1s9QaI-0007Gh-Rv
	for submit <at> debbugs.gnu.org; Tue, 21 May 2024 10:35:23 -0400
Received: from lists.gnu.org ([209.51.188.17]:56000)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sbaugh@HIDDEN>) id 1s9QaF-0007Ga-DK
 for submit <at> debbugs.gnu.org; Tue, 21 May 2024 10:35:21 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <sbaugh@HIDDEN>)
 id 1s9Qa8-0008Dh-Qa
 for bug-gnu-emacs@HIDDEN; Tue, 21 May 2024 10:35:13 -0400
Received: from mxout1.mail.janestreet.com ([38.105.200.78])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <sbaugh@HIDDEN>)
 id 1s9Qa5-00023o-Ga
 for bug-gnu-emacs@HIDDEN; Tue, 21 May 2024 10:35:12 -0400
From: Spencer Baugh <sbaugh@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: [PATCH] Prefer to run find and grep in parallel in rgrep
Date: Tue, 21 May 2024 10:35:07 -0400
Message-ID: <ierv8379qsk.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1716302107;
 bh=jglpTl5xjQaurrWecGMRyL2FNqPUq09Tg9kfSp3U0jI=;
 h=From:To:Cc:Subject:Date;
 b=wErw/MrsVsv7MhTMgVBH/26TIVWoX8Lwa1kL5bCWLJ+ejpBZkRt3Ed6Yz4aZa2zOa
 EvBftLDVf9SdNiNmTb79NngR3L3MVThioC4Oygz/fSwu5Cw9bPtR9f0D201KkbQoNn
 0uEPZdVrQ13ZQGU/BhPuiGpzwhAbP5BqtH+3bMAXm9u32Ui/z4K3zizrM9b2AHHn2r
 hmHcrJvQYu/4d2zCG85XsQfGV+mhv/opy5z6SaLj1LleUBKi7ueXiHb8OdxLzBRMno
 Moy5zdjsXMuNY0H9d+nrpG0FUDITT/pjlqlbmLezKr3Zfn3hTkUw0AAqSMLkctyEsx
 cizUcGz7boVgQ==
Received-SPF: pass client-ip=38.105.200.78; envelope-from=sbaugh@HIDDEN;
 helo=mxout1.mail.janestreet.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: Glenn Morris <rgm@HIDDEN>, dmitry@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

--=-=-=
Content-Type: text/plain

Tags: patch


grep.el prefers to run "find" and "xargs grep" in a pipeline,
which means that "find" can continue searching the filesystem
while "xargs grep" searches files.  If find and xargs don't
support the flags required for this behavior, grep.el will fall
back to using the -exec flags to "find", which meant "find" will
wait for each "grep" process to complete before continuing to
search the filesystem tree.  This behavior is controlled by
grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
the slower fallback.

In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
option was added for grep-find-use-xargs, which improves on
`exec' by running one "grep" process to search multiple files,
which `gnu' (by using xargs) already did.  However, the change
erroneously added the `exec-plus' case before the `gnu' case in
the autodetection code in grep-compute-defaults, so `exec-plus'
would be used even if `gnu' was supported.

This change just swaps the two cases, so the faster `gnu' option
is once again used in preference to `exec-plus'.  In my
benchmarking on a large repository, this provides a ~40%
speedup.


In GNU Emacs 29.2.50 (build 11, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2024-05-15 built on
 igm-qws-u22796a
Repository revision: 734740051bd377d24899d08d00ec8e1bb8e00e00
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.9 (Green Obsidian)

Configured using:
 'configure -C --with-x-toolkit=lucid --with-gif=ifavailable'


--=-=-=
Content-Type: text/patch
Content-Disposition: attachment;
 filename=0001-Prefer-to-run-find-and-grep-in-parallel-in-rgrep.patch

From 06f0683b51088e4c1c080408624f310d6561a381 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@HIDDEN>
Date: Tue, 21 May 2024 10:32:45 -0400
Subject: [PATCH] Prefer to run find and grep in parallel in rgrep

grep.el prefers to run "find" and "xargs grep" in a pipeline,
which means that "find" can continue searching the filesystem
while "xargs grep" searches files.  If find and xargs don't
support the flags required for this behavior, grep.el will fall
back to using the -exec flags to "find", which meant "find" will
wait for each "grep" process to complete before continuing to
search the filesystem tree.  This behavior is controlled by
grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
the slower fallback.

In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
option was added for grep-find-use-xargs, which improves on
`exec' by running one "grep" process to search multiple files,
which `gnu' (by using xargs) already did.  However, the change
erroneously added the `exec-plus' case before the `gnu' case in
the autodetection code in grep-compute-defaults, so `exec-plus'
would be used even if `gnu' was supported.

This change just swaps the two cases, so the faster `gnu' option
is once again used in preference to `exec-plus'.  In my
benchmarking on a large repository, this provides a ~40%
speedup.

* lisp/progmodes/grep.el (grep-compute-defaults): Prefer `gnu'.
for grep-find-use-xargs over `exec-plus'.
---
 lisp/progmodes/grep.el | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index 657349cbdff..04056e13685 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -812,15 +812,15 @@ grep-compute-defaults
 	(unless grep-find-use-xargs
 	  (setq grep-find-use-xargs
 		(cond
-		 ((grep-probe find-program
-			      `(nil nil nil ,(null-device) "-exec" "echo"
-				    "{}" "+"))
-		  'exec-plus)
 		 ((and
 		   (grep-probe
                     find-program `(nil nil nil ,(null-device) "-print0"))
 		   (grep-probe xargs-program '(nil nil nil "-0" "echo")))
 		  'gnu)
+		 ((grep-probe find-program
+			      `(nil nil nil ,(null-device) "-exec" "echo"
+				    "{}" "+"))
+		  'exec-plus)
 		 (t
 		  'exec))))
 	(unless grep-find-command
-- 
2.39.3


--=-=-=--




Acknowledgement sent to Spencer Baugh <sbaugh@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#71094; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Wed, 22 May 2024 18:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.