Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 10 Apr 2016 22:09:43 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 10 18:09:42 2016 Received: from localhost ([127.0.0.1]:57621 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1apNY6-0005U8-Lb for submit <at> debbugs.gnu.org; Sun, 10 Apr 2016 18:09:42 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48547) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1apNY4-0005To-FY; Sun, 10 Apr 2016 18:09:40 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 04292160FD3; Sun, 10 Apr 2016 15:09:35 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id RapsR_ztgtow; Sun, 10 Apr 2016 15:09:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 106A6161250; Sun, 10 Apr 2016 15:09:34 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id kIZBwdkxNnAV; Sun, 10 Apr 2016 15:09:33 -0700 (PDT) Received: from [192.168.1.25] (unknown [71.109.149.160]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2E1EE160FD3; Sun, 10 Apr 2016 15:09:28 -0700 (PDT) Subject: Re: bug#23234: unexpected results with charset handling in GNU grep 2.23 To: Zev Weiss <zev@HIDDEN>, Jim Meyering <jim@HIDDEN> References: <20160406192521.GA14451@HIDDEN> <570579DA.9020602@HIDDEN> <5705B7C1.4040301@HIDDEN> <5708BE92.6010002@HIDDEN> <570A121E.4010802@HIDDEN> <CA+8g5KGdA7ycjkmYePqQfJe48qGW_U7Fbh37NPMd02RDRkekVA@HIDDEN> <20160410215908.GA23038@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Message-ID: <570ACF0D.2040507@HIDDEN> Date: Sun, 10 Apr 2016 15:09:17 -0700 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160410215908.GA23038@HIDDEN> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 20768 Cc: Bjoern Jacke <bjoern@HIDDEN>, 23234-done <at> debbugs.gnu.org, 20768 <at> debbugs.gnu.org, 23234 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) On 04/10/2016 02:59 PM, Zev Weiss wrote: > I still have my multithreading patch series > (https://github.com/zevweiss/grep/) awaiting review, which I'd hope to > get applied at some point, though I'd guess it's enough of a review > task that delaying an impending release for it isn't likely (the > mbtoupper()-removal patch made that series one patch shorter though, > since one was to deal with that function's thread-unsafety). I've > been rebasing it periodically and running it on my own system in > /usr/local without any problems for a while now, for what that's worth. > > With current HEAD from savannah though, all check-very-expensive tests > pass for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux > kernel 4.3. Thanks for pinging us about this. Sorry, I kind of dropped the ball on this one. I will try to bump its priority. There are some other long-pending patches that also need review. I agree that these shouldn't delay the next release, but perhaps it could delay the release after that....
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 1 Jul 2015 19:18:21 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jul 01 15:18:21 2015 Received: from localhost ([127.0.0.1]:36370 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ZANWX-000362-2A for submit <at> debbugs.gnu.org; Wed, 01 Jul 2015 15:18:21 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:39233) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eggert@HIDDEN>) id 1ZANWU-00035Y-Er for 20768 <at> debbugs.gnu.org; Wed, 01 Jul 2015 15:18:19 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7105E160660; Wed, 1 Jul 2015 12:18:12 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id hqd66dKwxrRF; Wed, 1 Jul 2015 12:18:11 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CF35B160840; Wed, 1 Jul 2015 12:18:11 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id AhJTXOp70Alp; Wed, 1 Jul 2015 12:18:11 -0700 (PDT) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A831E160660; Wed, 1 Jul 2015 12:18:11 -0700 (PDT) Message-ID: <55943CF3.3050300@HIDDEN> Date: Wed, 01 Jul 2015 12:18:11 -0700 From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Zev Weiss <zev@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep References: <20150608042542.GH5705@HIDDEN> <55769055.8020801@HIDDEN> <20150611164427.GA14739@HIDDEN> <20150701174728.GM12186@HIDDEN> In-Reply-To: <20150701174728.GM12186@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.6 (/) Zev Weiss wrote: > > Any further thoughts on this now that the paperwork's done? Not yet I'm afraid. I do hope to get to grep tasks sometime this month....
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 1 Jul 2015 17:47:37 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jul 01 13:47:36 2015 Received: from localhost ([127.0.0.1]:36359 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ZAM6i-0000DM-Ge for submit <at> debbugs.gnu.org; Wed, 01 Jul 2015 13:47:36 -0400 Received: from thorn.bewilderbeest.net ([71.19.156.171]:39770) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <zev@HIDDEN>) id 1ZAM6f-0000DC-KB for 20768 <at> debbugs.gnu.org; Wed, 01 Jul 2015 13:47:34 -0400 Received: from hatter.bewilderbeest.net (hatter.bewilderbeest.net [IPv6:2001:470:c3f4:1::1:1]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zev) by thorn.bewilderbeest.net (Postfix) with ESMTPSA id 167D5801CE; Wed, 1 Jul 2015 10:47:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bewilderbeest.net; s=thorn; t=1435772852; bh=XuEuz3BYdodtcQg2dHQzpFyOpEAtuIvoark3/6CraZM=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=aBeZpmVowE7vN0FD/dFTvvpOtxdWRFM4B32XETc0gSkxziD9ZFsgdlnY5tLhkI1QV ylwdh0y1YoLPxzIPKC4M3U2A/vZ0DYLaPQTH5Ejb1D63JfmcOu3qzMaLyOPMe4kUfq ZQ9RXaV5Je2pKH/3KVRyQ0/B/HaTsCkWLOGEVzXk= Date: Wed, 1 Jul 2015 12:47:29 -0500 From: Zev Weiss <zev@HIDDEN> To: Paul Eggert <eggert@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep Message-ID: <20150701174728.GM12186@HIDDEN> References: <20150608042542.GH5705@HIDDEN> <55769055.8020801@HIDDEN> <20150611164427.GA14739@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20150611164427.GA14739@HIDDEN> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.6 (/) On Thu, Jun 11, 2015 at 11:44:28AM -0500, Zev Weiss wrote: >On Tue, Jun 09, 2015 at 12:05:57AM -0700, Paul Eggert wrote: >>Thanks very much for looking into this. I'll take a look at it in >>more detail once the paperwork goes through. I use grep -r a lot, >>and would appreciate the speedup. > >OK, I've received confirmation from Donald Robertson at the FSF that >the assignment process is complete. I've rebased that branch on >github with a few minor style & bug fixes since my initial email, but >it's still basically as described. > >Zev > Any further thoughts on this now that the paperwork's done? I realize reviewing large patches is non-trivial, but if there's anything I can do to ease the process (e.g. changes to how the patches are organized) please let me know. Thanks, Zev
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 11 Jun 2015 16:44:44 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jun 11 12:44:44 2015 Received: from localhost ([127.0.0.1]:50972 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z35at-00006Q-Tb for submit <at> debbugs.gnu.org; Thu, 11 Jun 2015 12:44:44 -0400 Received: from thorn.bewilderbeest.net ([71.19.156.171]:47213) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <zev@HIDDEN>) id 1Z35aq-00006H-SR for 20768 <at> debbugs.gnu.org; Thu, 11 Jun 2015 12:44:41 -0400 Received: from hatter.bewilderbeest.net (hatter.bewilderbeest.net [IPv6:2001:470:c3f4:1::1:1]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zev) by thorn.bewilderbeest.net (Postfix) with ESMTPSA id 11A108015D; Thu, 11 Jun 2015 09:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bewilderbeest.net; s=thorn; t=1434041079; bh=zzsTJTDHg57lN33w+JsaFJKbnl3jnnW+OWZWNL7nGbw=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=UJCop5UqrGSvQ+F81VEIR0HShm7qutR+VIqn8UPuLXibfABnFmQkAQkBHSwm3o87G 7zlQzWnw7cgfXfinGD1+lBYKKQPalR3q7MgXzAT/eqD+oPJQ+8EwDOMeImIwTwCvvF DFf0+0ozwWv8jRZkMYAQRKpc0cu4wOX6r8Dc6y5A= Date: Thu, 11 Jun 2015 11:44:28 -0500 From: Zev Weiss <zev@HIDDEN> To: Paul Eggert <eggert@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep Message-ID: <20150611164427.GA14739@HIDDEN> References: <20150608042542.GH5705@HIDDEN> <55769055.8020801@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <55769055.8020801@HIDDEN> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) On Tue, Jun 09, 2015 at 12:05:57AM -0700, Paul Eggert wrote: >Thanks very much for looking into this. I'll take a look at it in >more detail once the paperwork goes through. I use grep -r a lot, and >would appreciate the speedup. OK, I've received confirmation from Donald Robertson at the FSF that the assignment process is complete. I've rebased that branch on github with a few minor style & bug fixes since my initial email, but it's still basically as described. Zev
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 9 Jun 2015 19:41:41 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 09 15:41:40 2015 Received: from localhost ([127.0.0.1]:46466 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z2PP1-0006YY-M0 for submit <at> debbugs.gnu.org; Tue, 09 Jun 2015 15:41:40 -0400 Received: from thorn.bewilderbeest.net ([71.19.156.171]:34927) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <zev@HIDDEN>) id 1Z2POy-0006YO-Bu for 20768 <at> debbugs.gnu.org; Tue, 09 Jun 2015 15:41:38 -0400 Received: from hatter.bewilderbeest.net (hatter.bewilderbeest.net [IPv6:2001:470:c3f4:1::1:1]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zev) by thorn.bewilderbeest.net (Postfix) with ESMTPSA id 010628015D; Tue, 9 Jun 2015 12:41:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bewilderbeest.net; s=thorn; t=1433878895; bh=cBGSGFbwGmr6qetWOMPHPR2CSUnXeOWQg4q5ASPwlBo=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=Hja7+aHZ7yIqvbYdHkjZ5URtDUdhmAy3iIuHKusa2JNCiD8qZjRQy2oVfo6MPUEUU czmmUk//njCtbNk1Lv0D90Ggm+H/DDYVR+9WTW9sqnuxMZl7BFUUbYYLrhMFXgt+2K H/s4/2F/r/uk0SdBGlRXL9YSnnGXd3qcNDcwbp+U= Date: Tue, 9 Jun 2015 14:41:32 -0500 From: Zev Weiss <zev@HIDDEN> To: Aaron Crane <grep@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep Message-ID: <20150609194132.GG29699@HIDDEN> References: <20150608042542.GH5705@HIDDEN> <CACmk_tskfGX7pUTyzJTpOej1O7mBwONQ11oM-sQmTkQypbQCfA@HIDDEN> <20150609102636.GD29699@HIDDEN> <CACmk_tsbeKjAxP2SUexCwLzWdZv8Le=1abREk-oEiA6DtgtVyA@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <CACmk_tsbeKjAxP2SUexCwLzWdZv8Le=1abREk-oEiA6DtgtVyA@HIDDEN> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) On Tue, Jun 09, 2015 at 12:04:11PM +0100, Aaron Crane wrote: >Zev Weiss <zev@HIDDEN> wrote: >> Hmm -- I picked --parallel largely for consistency with the corresponding >> flag for coreutils' sort, which strikes me as a closer relative to grep than >> either make or parallel. > >That's a good point; I wasn't aware of sort's --parallel option. >Though I also note that "sort --parallel=4" limits the number of >threads to 4, rather than increasing the number of threads from 1 to >4, so the comparison isn't exact. > >> sort doesn't >> have a matching short option though, so I went with -M to suggest >> "mulithreaded" (since, as you point out, -P is already in use). Though I >> notice now that lower-case -p is still available; perhaps that might be >> better than -M. > >I'm a little unhappy about the idea of proliferating the world's set >of short options in this space, to be honest. If grep didn't already >have -P, I'd be happy enough with -P and either --parallel or >--max-procs, but I'm not terribly fond of the idea of introducing >either -M or -p. > >-- >Aaron Crane ** http://aaroncrane.co.uk/ True, I suppose that's a reasonable concern (especially given how many there are now). My thought was that at least for me (and it sounds like perhaps Paul as well) this would be fairly likely to be a commonly used option, so I'd like a nice concise way of enabling it. With sort there's no real downside to just enabling multithreading by default, so a longopt-only flag is fine. With grep however (at least with my current implementation) there are tradeoffs with output ordering that may be undesirable (and which I don't see a good way around without introducing a bunch of potentially-complicated and performance-reducing per-file output buffering), so I kept it off by default. There's also the question of the argument parsing mentioned in my original email -- as it stands now, '-M' would be the only short option with an optional argument, which has potential to be confusing. Thinking about it a bit more, I realize that what I really want out of the short flag is just a shorter way to say --parallel=NUMCPUS (and not have to remember how many CPUs the machine I'm on has), so perhaps another possibility on that front would be to leave the long option as-is but have the short flag (assuming there is one) not take an argument (though I suppose that could perhaps be seen as confusing in its own way too). Zev
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 9 Jun 2015 11:04:35 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 09 07:04:35 2015 Received: from localhost ([127.0.0.1]:45778 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z2HKc-00086f-Oq for submit <at> debbugs.gnu.org; Tue, 09 Jun 2015 07:04:35 -0400 Received: from mail-oi0-f52.google.com ([209.85.218.52]:35738) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <arc@HIDDEN>) id 1Z2HKa-00086S-ET for 20768 <at> debbugs.gnu.org; Tue, 09 Jun 2015 07:04:33 -0400 Received: by oihd6 with SMTP id d6so9065324oih.2 for <20768 <at> debbugs.gnu.org>; Tue, 09 Jun 2015 04:04:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-type; bh=TIhtK9SrJT1HTe8t1rKPSVORqQ/YYxaFJIHQpf0lkvc=; b=MFF9g8qQ6lS15RGfRE2Dvqf7uu7YJXjVfbisw88bgwvmqar3piOodH2lQIdHlODcQc nOx2J3zvFkLJiN4ztNzv6l20CT/A/jaI5aIU3iS2Obc0f90oHCNAuaqxD///COirUsUu gCEUTAZAkauEEkd5sx0YJ/k7f1UOJ6kDjBnSsk7A/+J2GgDbcnMb8W+mB2BIZoXQ+uWr tgIzT464u0vcqf7Rn5fQpSK2cCNzMGu6my3zc2ikhTJXr17FgrMSNxSVZwnRmjnaf6qd U/tMvaijDEoqoEA3qLnnjCustgqpBvnAsGQzVNdgj9sEuXVkCGtkvVsUgSsjT+0CHNvT 7LIg== X-Gm-Message-State: ALoCoQlZQl6YcKDJmE7CJ6petx8Gi9IEXwgI403K6lZ4uA9FRMH86RZnYW23y/vpq6XAw7H3QTl1 X-Received: by 10.182.80.225 with SMTP id u1mr19150488obx.23.1433847866819; Tue, 09 Jun 2015 04:04:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.85.168 with HTTP; Tue, 9 Jun 2015 04:04:11 -0700 (PDT) X-Originating-IP: [86.178.9.235] In-Reply-To: <20150609102636.GD29699@HIDDEN> References: <20150608042542.GH5705@HIDDEN> <CACmk_tskfGX7pUTyzJTpOej1O7mBwONQ11oM-sQmTkQypbQCfA@HIDDEN> <20150609102636.GD29699@HIDDEN> From: Aaron Crane <grep@HIDDEN> Date: Tue, 9 Jun 2015 12:04:11 +0100 X-Google-Sender-Auth: PtEjB2bSj6m9bvDBLC-ktNTMWZM Message-ID: <CACmk_tsbeKjAxP2SUexCwLzWdZv8Le=1abREk-oEiA6DtgtVyA@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep To: Zev Weiss <zev@HIDDEN> Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.7 (/) Zev Weiss <zev@HIDDEN> wrote: > Hmm -- I picked --parallel largely for consistency with the corresponding > flag for coreutils' sort, which strikes me as a closer relative to grep than > either make or parallel. That's a good point; I wasn't aware of sort's --parallel option. Though I also note that "sort --parallel=4" limits the number of threads to 4, rather than increasing the number of threads from 1 to 4, so the comparison isn't exact. > sort doesn't > have a matching short option though, so I went with -M to suggest > "mulithreaded" (since, as you point out, -P is already in use). Though I > notice now that lower-case -p is still available; perhaps that might be > better than -M. I'm a little unhappy about the idea of proliferating the world's set of short options in this space, to be honest. If grep didn't already have -P, I'd be happy enough with -P and either --parallel or --max-procs, but I'm not terribly fond of the idea of introducing either -M or -p. -- Aaron Crane ** http://aaroncrane.co.uk/
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 9 Jun 2015 10:26:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 09 06:26:49 2015 Received: from localhost ([127.0.0.1]:45748 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z2Gk5-00078d-1u for submit <at> debbugs.gnu.org; Tue, 09 Jun 2015 06:26:49 -0400 Received: from thorn.bewilderbeest.net ([71.19.156.171]:52322) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <zev@HIDDEN>) id 1Z2Gjz-00078O-6I for 20768 <at> debbugs.gnu.org; Tue, 09 Jun 2015 06:26:47 -0400 Received: from hatter.bewilderbeest.net (hatter.bewilderbeest.net [IPv6:2001:470:c3f4:1::1:1]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zev) by thorn.bewilderbeest.net (Postfix) with ESMTPSA id DAC658015D; Tue, 9 Jun 2015 03:26:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bewilderbeest.net; s=thorn; t=1433845601; bh=OuqS/Cz8aEMFg8GOeLl5xcAN6+QHipMXOfRBvX9rZ+4=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=TCOSa8Hp0bMd0/91i/hOVQ59dJtBtlE0hr9TjsKZkf1hzhs17eMUNYQShpRcNMlDq ZlnvpzXx/LotsHZkNc+WgaZrTgJnjgYslrIB1URpclJgyIahPXZ4DMt4rX3ufJLxZ6 znHBzmgiIkXtiBv2p3z6jvIasOLPQ+Bsl6nEplIw= Date: Tue, 9 Jun 2015 05:26:37 -0500 From: Zev Weiss <zev@HIDDEN> To: Aaron Crane <grep@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep Message-ID: <20150609102636.GD29699@HIDDEN> References: <20150608042542.GH5705@HIDDEN> <CACmk_tskfGX7pUTyzJTpOej1O7mBwONQ11oM-sQmTkQypbQCfA@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <CACmk_tskfGX7pUTyzJTpOej1O7mBwONQ11oM-sQmTkQypbQCfA@HIDDEN> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) On Tue, Jun 09, 2015 at 11:00:34AM +0100, Aaron Crane wrote: >Zev Weiss <zev@HIDDEN> wrote: >> At a high level: I've added a new flag, -M/--parallel[=N], that >> enables multithreaded operation. > >Thanks, this looks really interesting. > >I'd like to suggest changing the code to use -j/--jobs as the name for >the relevant option; that would match both GNU Make and GNU parallel. >(GNU parallel also allows -P/--max-procs as an alias, but -P already >has a meaning in GNU grep.) > >-- >Aaron Crane ** http://aaroncrane.co.uk/ Hmm -- I picked --parallel largely for consistency with the corresponding flag for coreutils' sort, which strikes me as a closer relative to grep than either make or parallel. Also, the word "jobs" has (at least to me) a definite suggestion of multiple processes rather than threads. sort doesn't have a matching short option though, so I went with -M to suggest "mulithreaded" (since, as you point out, -P is already in use). Though I notice now that lower-case -p is still available; perhaps that might be better than -M. Zev
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 9 Jun 2015 10:00:59 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 09 06:00:59 2015 Received: from localhost ([127.0.0.1]:45743 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z2GL4-0006WE-3w for submit <at> debbugs.gnu.org; Tue, 09 Jun 2015 06:00:58 -0400 Received: from mail-ob0-f180.google.com ([209.85.214.180]:36738) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <arc@HIDDEN>) id 1Z2GL1-0006W1-6H for 20768 <at> debbugs.gnu.org; Tue, 09 Jun 2015 06:00:55 -0400 Received: by obbqz1 with SMTP id qz1so8938187obb.3 for <20768 <at> debbugs.gnu.org>; Tue, 09 Jun 2015 03:00:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-type; bh=z46Q19lFyfpD3wtHwaK4N1AKnRpGzHaEo4nHrLKWhOc=; b=fQpUI8VGdmeEBG11tvceRgIAu5kNOBi+G9LE4EH0m+xCpuhaooRsCrCeryotzEKyRv P3MlOKUh1Qgt7dB/rggR08MzaNh6ptMHThpJlhDvHvurR1oH/ABe2Db4m6t/hhzSMYDa ql1yoOqPm/BpzVjKz2EqSMjx+0Shorsb3XOnOJRMDTodQIxQKLLe8aKAdQ/G9XzK8AKP s8Qb5we4yGmQtiEZ1NstYC3ivium6QAu/YQC+MahDiGe49X+Jbhc2OeW9NUZbn3GPV7K dO3NbRzp3VVI2nafUhD75gsKKlCBBuykkStgk4n2CSOq5tpJMvHv3J9JjFZdsJtEoBlJ Io1Q== X-Gm-Message-State: ALoCoQnJjTdEJ4kxijNlcqUDYs4DVoev6kqkXlk1wRVkulV9EezPPlrw2hpB8QryiOSZH4rnBeJd X-Received: by 10.202.89.131 with SMTP id n125mr17653573oib.91.1433844049391; Tue, 09 Jun 2015 03:00:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.85.168 with HTTP; Tue, 9 Jun 2015 03:00:34 -0700 (PDT) X-Originating-IP: [86.178.9.235] In-Reply-To: <20150608042542.GH5705@HIDDEN> References: <20150608042542.GH5705@HIDDEN> From: Aaron Crane <grep@HIDDEN> Date: Tue, 9 Jun 2015 11:00:34 +0100 X-Google-Sender-Auth: ud9wo30GWvzBl7HntMh4IrAzp8U Message-ID: <CACmk_tskfGX7pUTyzJTpOej1O7mBwONQ11oM-sQmTkQypbQCfA@HIDDEN> Subject: Re: bug#20768: RFC: Multithreaded grep To: Zev Weiss <zev@HIDDEN> Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 20768 Cc: 20768 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.7 (/) Zev Weiss <zev@HIDDEN> wrote: > At a high level: I've added a new flag, -M/--parallel[=N], that > enables multithreaded operation. Thanks, this looks really interesting. I'd like to suggest changing the code to use -j/--jobs as the name for the relevant option; that would match both GNU Make and GNU parallel. (GNU parallel also allows -P/--max-procs as an alias, but -P already has a meaning in GNU grep.) -- Aaron Crane ** http://aaroncrane.co.uk/
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at 20768) by debbugs.gnu.org; 9 Jun 2015 07:06:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 09 03:06:05 2015 Received: from localhost ([127.0.0.1]:45641 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z2Dbp-0002MR-2c for submit <at> debbugs.gnu.org; Tue, 09 Jun 2015 03:06:05 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:53039) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eggert@HIDDEN>) id 1Z2Dbn-0002Lx-T8 for 20768 <at> debbugs.gnu.org; Tue, 09 Jun 2015 03:06:04 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 37FF316071F; Tue, 9 Jun 2015 00:05:58 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 4efqkqqp2pb8; Tue, 9 Jun 2015 00:05:57 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 96145160744; Tue, 9 Jun 2015 00:05:57 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id vK_hPKdUy4aZ; Tue, 9 Jun 2015 00:05:57 -0700 (PDT) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 794AC16071F; Tue, 9 Jun 2015 00:05:57 -0700 (PDT) Message-ID: <55769055.8020801@HIDDEN> Date: Tue, 09 Jun 2015 00:05:57 -0700 From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Zev Weiss <zev@HIDDEN>, 20768 <at> debbugs.gnu.org Subject: Re: bug#20768: RFC: Multithreaded grep References: <20150608042542.GH5705@HIDDEN> In-Reply-To: <20150608042542.GH5705@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20768 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) Thanks very much for looking into this. I'll take a look at it in more detail once the paperwork goes through. I use grep -r a lot, and would appreciate the speedup.
bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 8 Jun 2015 05:32:28 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jun 08 01:32:28 2015 Received: from localhost ([127.0.0.1]:44520 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z1pfe-0004y8-UY for submit <at> debbugs.gnu.org; Mon, 08 Jun 2015 01:32:28 -0400 Received: from eggs.gnu.org ([208.118.235.92]:60098) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <zev@HIDDEN>) id 1Z1odM-00033f-0K for submit <at> debbugs.gnu.org; Mon, 08 Jun 2015 00:26:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <zev@HIDDEN>) id 1Z1odF-0008Mx-BE for submit <at> debbugs.gnu.org; Mon, 08 Jun 2015 00:25:54 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:54519) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <zev@HIDDEN>) id 1Z1odF-0008Ms-8F for submit <at> debbugs.gnu.org; Mon, 08 Jun 2015 00:25:53 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45355) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <zev@HIDDEN>) id 1Z1odD-00022o-Ob for bug-grep@HIDDEN; Mon, 08 Jun 2015 00:25:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <zev@HIDDEN>) id 1Z1odA-0008MA-IR for bug-grep@HIDDEN; Mon, 08 Jun 2015 00:25:51 -0400 Received: from thorn.bewilderbeest.net ([71.19.156.171]:60843) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <zev@HIDDEN>) id 1Z1od9-0008M0-UF for bug-grep@HIDDEN; Mon, 08 Jun 2015 00:25:48 -0400 Received: from hatter.bewilderbeest.net (hatter.bewilderbeest.net [IPv6:2001:470:c3f4:1::1:1]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zev) by thorn.bewilderbeest.net (Postfix) with ESMTPSA id 90FF48015D for <bug-grep@HIDDEN>; Sun, 7 Jun 2015 21:25:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bewilderbeest.net; s=thorn; t=1433737545; bh=CpRYL4cVLltOhk3uRPLEu37CDwtUTjut7P7NTeEl6hw=; h=Date:From:To:Subject; b=j90TEP9/UnEXsEgBy8PLIk9X1eP5vYRu0A+c8b5BfWd25EpYF/hpHo2+BUFwPDJbR lmnCPRzNy6MzJ3H3z63gmcE3GXHLasoA0CgbWFwTrXDg7LOCmnGroEL/Y2Kb0FWEgq ikgMWf3KO0dHD5aJScpPyyyUXSCbJoF4GC8LtW4k= Date: Sun, 7 Jun 2015 23:25:43 -0500 From: Zev Weiss <zev@HIDDEN> To: bug-grep@HIDDEN Subject: RFC: Multithreaded grep Message-ID: <20150608042542.GH5705@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 08 Jun 2015 01:32:24 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) Hello, After what looks like 16+ years as an entry in the TODO file (back to the first commit in the git history), I decided to see if I couldn't get grep going multithreaded. I now have a working version, so I figured I'd try to get some feedback on it in its current form and hopefully ultimately get it included in GNU grep. At a high level: I've added a new flag, -M/--parallel[=N], that enables multithreaded operation. The N worker threads (defaulting to the number of available CPUs) operate in parallel at file granularity, so it's only of use when operating on multiple files (perhaps via -R/-r/--directories=recurse). The main thread opens each file and adds it to a global queue of files to be searched ('workqueue' in src/grep.c), and the worker threads then pull items off the queue to search through. For some rough performance numbers, on a 3.5GHz 4-core/8-thread Haswell running Linux 3.16 with a warm pagecache: $ time ./src/grep -r FOOB $DIR ... real 0m13.740s user 0m12.424s sys 0m2.376s $ time ./src/grep -M -r FOOB $DIR ... real 0m3.348s user 0m19.428s sys 0m3.480s $DIR here is a 15GB tree (24299 directories, 305282 files) of assorted code -- git repos, tarballs, etc. [Side note: this workload (perhaps grep in general, not sure) suffered a major performance regression between v2.20 and v2.21, which I bisected to commit cd36abd4 -- commit dfff75a4 then recovered some of the lost performance, but it's still substantially slower than it had been.] If preferred I can send all the patches directly to the mailing list (or as one combined diff, whatever's convenient), but for now it's currently viewable as a series of mostly fairly fine-grained git commits in the 'multithread' branch at: https://github.com/zevweiss/grep (A few of the small cleanup commits might also be of use independently of the multithreading effort.) Some notes, caveats and such: - It doesn't presently build -- all my testing & development has involved running the final link command manually, because (despite hours of bashing my head against it) I've been unable to convince the bootstrap/autotools/gnulib/etc agglomeration to add -lpthread to the command line. Any advice here would be appreciated. - There's a slight quirk in command-line interpretation: since the '-M' short flag takes an optional numeric argument, 'grep -M4' will be interpreted as '--parallel=4', not '--parallel --context=4'. This seems safe from a backwards-compatibility standpoint (since no existing scripts should be using '-M'), but could potentially produce surprising results if existing scripts are modified in particular ways (e.g. changing 'grep -R4' to 'grep -RM4'). I'm open to suggestions on more graceful ways to handle this. - I'm not really at all familiar with the internal workings of grep's actual text-search guts, so some of the identifiers I've chosen in the parts that had to touch those bits (or elsewhere, for that matter) may not be the best. - Some of the un-sharing (moving global state into thread-private structs) is probably a bit heavy-handed; i.e. some things are now thread-private that could have remained global, but (at least for now) it was easier to just do it that way. - Along similar lines, it would probably be nicer to have a way to just copy the result of `compile()' rather than re-compiling for every thread. - Testing: I've extended a few of the tests in the testsuite to also run multithreaded, but the coverage isn't exactly extensive. - I haven't (yet) completed the FSF's copyright-assignment process, though I have sent the initial email requesting the paperwork. - Commit messages are in a different style than used in the grep git repo; I'll fix them up depending on how exactly they should be bundled together. - Though I have attempted to follow the GNU coding style, my own usual coding style differs from it quite a bit, and there may as a result be some stylistic misfits here and there. I'm happy to fix any of these anyone points out. Any/all feedback welcome. Thanks, Zev Weiss
Zev Weiss <zev@HIDDEN>
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#20768
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.