Received: (at 32073) by debbugs.gnu.org; 2 Jan 2020 01:04:17 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 20:04:17 2020 Received: from localhost ([127.0.0.1]:37835 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imouO-0004E8-SV for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 20:04:17 -0500 Received: from mail-io1-f47.google.com ([209.85.166.47]:44138) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <sh@HIDDEN>) id 1imouM-0004Dv-Q6 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 20:04:15 -0500 Received: by mail-io1-f47.google.com with SMTP id b10so36954283iof.11 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 17:04:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=discovergy-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uG0zcpsFJMb40VAoaqgG4yBxN9fQ8HWatjcq1WBdwuI=; b=MrT5OWrM9nJE49cTUjxs8k/CxT7nbY4ZeVQEGSTjEnMFfbQgATGf6icSTcK75Z88No nNl+qTwFLLBjZattlCjmMwjNt8ZavrfHuQJQJUOMBpTmDoB6y+kw/Hp3G5lBJ5zuSawo EgkmrtKl6uGtcn+GLpXN0/U+qbL7M2RfFYL30m0JYOBRix5Yt95amdM6LpKCvddxzao8 nXZRyxNjdFAEBlTNx2e9ItM8eCid8K/Yu+gbtEl6aMmyh5FuwU7GaMLAjGGObUIGWqkc jiMxWWi+Zp/GIXeZKmkeOuZwGz8xt9iuBOC6w/J19PbEJagxok0z8tZD2+9n/HZuWW9E HHJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uG0zcpsFJMb40VAoaqgG4yBxN9fQ8HWatjcq1WBdwuI=; b=nUyERW3t+ZcnzUWltGBTcmkQR4kKsjbsyF320UpOEb5933zi5sJoVEw7z0JDP1rDfV FTeC7XyNHNGI7zX8rQnDkOhKs+tPCFRX4SomGFhkhIFuuEJtT4/IQpPGFpRIsuicQifn +hPRNqytX/ulsOZJL5Le0w8fTXV03dHuosziGZqMBPDJsG824Czh51KM0ijQf+VaEYXY 3QsP9zH3EufrihVbr0jprdN/b43SMG7JsgGJUa1NL1pDcGpUJ1z0KAlrEiFptwzDtaTQ JhYwBaCQcWUuTW97ch2C4GPYKSXlMCHKoPoYuufa8T2zMwi/+UMknLMiip3u+qcHP9sT QQLA== X-Gm-Message-State: APjAAAUBj/1bVqH9LaAq+VlcVHbDLEec28q59Knq8uW/Ze9lRNRmT/M+ p+Zbru5g2e8Y9UmFhGS3x4ih5Z7nQhhWzjs0qBtrMQ== X-Google-Smtp-Source: APXvYqxTMks5Ajdq0T7iEzgJjZoWJOPNuJpms5hMMgKkp0KY1CrYSQcHExV4liYzv2ybrV18DBFtVhqQwDumVAvR104= X-Received: by 2002:a5e:8505:: with SMTP id i5mr50080878ioj.158.1577927049287; Wed, 01 Jan 2020 17:04:09 -0800 (PST) MIME-Version: 1.0 References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> <CA+8g5KEEqcTjV3k+50y4SNhUrrhwO4ACtUuM5PDeRHaaBRAKBg@HIDDEN> In-Reply-To: <CA+8g5KEEqcTjV3k+50y4SNhUrrhwO4ACtUuM5PDeRHaaBRAKBg@HIDDEN> From: Sergiu Hlihor <sh@HIDDEN> Date: Thu, 2 Jan 2020 02:03:58 +0100 Message-ID: <CAD-3cddJmwBTqozvJcJerc8tRXcv0-2Pf0aePe2yhkJaSOY+vA@HIDDEN> Subject: Re: Improvements in Grep (Bug#32073) To: Jim Meyering <jim@HIDDEN> Content-Type: multipart/alternative; boundary="000000000000412dba059b1dc5f9" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>, Dennis Clarke <dclarke@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --000000000000412dba059b1dc5f9 Content-Type: text/plain; charset="UTF-8" Hi Jim, The system for which this hurts me the most is an Ubuntu 14.04 where I'd need to run it as a separate binary. As I'm not familiar with the way it's built, is there any guidelines of how to build it from sources? I'd happy build it with ever larger block sizes and test. On Thu, 2 Jan 2020 at 01:51, Jim Meyering <jim@HIDDEN> wrote: > On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <sh@HIDDEN> wrote: > > Paul, I have to correct you. On a production server you have usually a > mix of applications many times including databases. For databases, having a > read ahead means one IO less since usually database access patterns are > random reads. Here actually best is to disable completely read ahead. In > fact, I do have to say that probably best is to disable completely read > ahead and let applications deal with it, either in an automatic fashion, > like reading the optimal IO block size from device or in a configurable > way with defaults good enough for today's servers. If you now configure the > OS to do a read ahead hitting all HDDs then you induce potentially > unnecessary IO load for all applications which use it, which when having > HDDs is totally unacceptable. That's why the best is to be application > specific and ideally configured to use optimal IO block size. > > > > So no, letting OS to do it is stupid. > > > > On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@HIDDEN> wrote: > >> > >> On 1/1/20 1:15 AM, Sergiu Hlihor wrote: > >> > If you rely on OS, then > >> > you are at the mercy of whatever read ahead configuration you have. > >> > >> Right, and whatever changes you make to the OS and its read-ahead > configuration > >> will work for all applications, not just for 'grep'. So, change the OS > to do > >> that. There shouldn't be a need to change 'grep' in particular (or 'cp' > in > >> particular, or 'awk' in particular, etc.). > >> > >> > The issue of large > >> > block sizes for IO operations is widespread across all tools from > Linux, > >> > like rsync or cp and its only getting worse > >> > >> Quite right. And it would be painful to have to modify all those tools, > and to > >> maintain those modifications. So modify the OS instead. Scheduling > read-ahead is > >> really the OS's job anyway. > > Hi Sergiu, > > If you would like to help make grep use larger buffer sizes, please > run and report benchmarks measuring how much of a difference it would > make, at least for your hardware. Here are some of the tests I ran to > justify raising it from ~32k to ~96k: > https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html > --000000000000412dba059b1dc5f9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div>Hi Jim,</div><div>The system for whi= ch this hurts me the most is an Ubuntu 14.04 where I'd need to run it a= s a separate binary. As I'm not familiar with the way it's built, i= s there any guidelines of how to build it from sources? I'd happy build= it with ever larger block sizes and test.</div></div><br><div class=3D"gma= il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, 2 Jan 2020 at 01:51= , Jim Meyering <<a href=3D"mailto:jim@HIDDEN">jim@HIDDEN</a>= > wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px = 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On W= ed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <<a href=3D"mailto:sh@discover= gy.com" target=3D"_blank">sh@HIDDEN</a>> wrote:<br> > Paul, I have to correct you. On a production server you have usually a= mix of applications many times including databases. For databases, having = a read ahead means one IO less since usually database access patterns are r= andom reads. Here actually best is to disable completely read ahead. In fac= t, I do have to say that probably best is to disable completely read ahead = and let applications deal with it, either in an automatic fashion, like rea= ding the optimal IO block size from device=C2=A0 or in a configurable way w= ith defaults good enough for today's servers. If you now configure the = OS to do a read ahead hitting all HDDs then you induce potentially unnecess= ary IO load for all applications which use it, which when having HDDs is to= tally unacceptable. That's why the best is to be application specific a= nd ideally configured to use optimal IO block size.<br> ><br> > So no, letting OS to do it is stupid.<br> ><br> > On Wed, 1 Jan 2020 at 20:42, Paul Eggert <<a href=3D"mailto:eggert@= cs.ucla.edu" target=3D"_blank">eggert@HIDDEN</a>> wrote:<br> >><br> >> On 1/1/20 1:15 AM, Sergiu Hlihor wrote:<br> >> > If you rely on OS, then<br> >> > you are at the mercy of whatever read ahead configuration you= have.<br> >><br> >> Right, and whatever changes you make to the OS and its read-ahead = configuration<br> >> will work for all applications, not just for 'grep'. So, c= hange the OS to do<br> >> that. There shouldn't be a need to change 'grep' in pa= rticular (or 'cp' in<br> >> particular, or 'awk' in particular, etc.).<br> >><br> >> > The issue of large<br> >> > block sizes for IO operations is widespread across all tools = from Linux,<br> >> > like rsync or cp and its only getting worse<br> >><br> >> Quite right. And it would be painful to have to modify all those t= ools, and to<br> >> maintain those modifications. So modify the OS instead. Scheduling= read-ahead is<br> >> really the OS's job anyway.<br> <br> Hi Sergiu,<br> <br> If you would like to help make grep use larger buffer sizes, please<br> run and report benchmarks measuring how much of a difference it would<br> make, at least for your hardware. Here are some of the tests I ran to<br> justify raising it from ~32k to ~96k:<br> <a href=3D"https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.h= tml" rel=3D"noreferrer" target=3D"_blank">https://lists.gnu.org/archive/htm= l/grep-devel/2018-10/msg00002.html</a><br> </blockquote></div></div> --000000000000412dba059b1dc5f9--
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 2 Jan 2020 00:51:21 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 19:51:20 2020 Received: from localhost ([127.0.0.1]:37827 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imohs-0003uy-Ks for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 19:51:20 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:38805) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1imohq-0003ul-0s for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 19:51:18 -0500 Received: by mail-wr1-f67.google.com with SMTP id y17so37907645wrh.5 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 16:51:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=+G2u1+TYKprGWirYES7YqULTKruNQnNn0zOaJRjBkZQ=; b=DJpehv2iK45sVa2pCWwsjuhscGIgi7Vi+JiDPqoKcUIob746N1bEwKed6Zz4uBTo9J 79eVt33udjV2xpDpaBAbUI0+JClV5SM+w5iEsbV0baXoAD+PkggvHJlJyD4hIVd2kP4O O0dkvRo5s161Ji2xmGe4jjxgLfiZs1Tlbt1ZM4yEdEJ/XvBYVJa1fMgNdtC4bHDmtth8 EcfBurLtE+kUPbjWpdJJ223Xz9gRhcVjLod4RgxiZCFORQDSHSQmGkwjHQGytLv2NjD+ xotEpLdrbGq5KeOyV4w0qtm/f0wbzXmHQE3rgggERy8/QH+o63Pu6RGSU76ILOYZ8Enk GflA== X-Gm-Message-State: APjAAAUvGuD5mRkDz7QdxDJYD/M5KrwuoEZQvDe9YthH/7iwGE96hNvJ zm+pZHTKXMSPjejYb6YEoyW91XAaUVX51yl+rUE= X-Google-Smtp-Source: APXvYqx0LgCp9iag1ABv/x0dAx+wgdhnP1u3ZR3gJKeHrLjJoDlAoey26zheesXxnwVnWVakWy8OeBIjdeOkucRiy2M= X-Received: by 2002:a5d:670a:: with SMTP id o10mr82667154wru.227.1577926272259; Wed, 01 Jan 2020 16:51:12 -0800 (PST) MIME-Version: 1.0 References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> In-Reply-To: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> From: Jim Meyering <jim@HIDDEN> Date: Wed, 1 Jan 2020 16:51:00 -0800 Message-ID: <CA+8g5KEEqcTjV3k+50y4SNhUrrhwO4ACtUuM5PDeRHaaBRAKBg@HIDDEN> Subject: Re: Improvements in Grep (Bug#32073) To: Sergiu Hlihor <sh@HIDDEN> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>, Dennis Clarke <dclarke@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.5 (/) On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <sh@HIDDEN> wrote: > Paul, I have to correct you. On a production server you have usually a mi= x of applications many times including databases. For databases, having a r= ead ahead means one IO less since usually database access patterns are rand= om reads. Here actually best is to disable completely read ahead. In fact, = I do have to say that probably best is to disable completely read ahead and= let applications deal with it, either in an automatic fashion, like readin= g the optimal IO block size from device or in a configurable way with defa= ults good enough for today's servers. If you now configure the OS to do a r= ead ahead hitting all HDDs then you induce potentially unnecessary IO load = for all applications which use it, which when having HDDs is totally unacce= ptable. That's why the best is to be application specific and ideally confi= gured to use optimal IO block size. > > So no, letting OS to do it is stupid. > > On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@HIDDEN> wrote: >> >> On 1/1/20 1:15 AM, Sergiu Hlihor wrote: >> > If you rely on OS, then >> > you are at the mercy of whatever read ahead configuration you have. >> >> Right, and whatever changes you make to the OS and its read-ahead config= uration >> will work for all applications, not just for 'grep'. So, change the OS t= o do >> that. There shouldn't be a need to change 'grep' in particular (or 'cp' = in >> particular, or 'awk' in particular, etc.). >> >> > The issue of large >> > block sizes for IO operations is widespread across all tools from Linu= x, >> > like rsync or cp and its only getting worse >> >> Quite right. And it would be painful to have to modify all those tools, = and to >> maintain those modifications. So modify the OS instead. Scheduling read-= ahead is >> really the OS's job anyway. Hi Sergiu, If you would like to help make grep use larger buffer sizes, please run and report benchmarks measuring how much of a difference it would make, at least for your hardware. Here are some of the tests I ran to justify raising it from ~32k to ~96k: https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 1 Jan 2020 21:46:15 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 16:46:15 2020 Received: from localhost ([127.0.0.1]:37671 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imlol-0001m9-FE for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:46:15 -0500 Received: from lists.gnu.org ([209.51.188.17]:36204) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <pj@HIDDEN>) id 1imlok-0001m2-Fw for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:46:14 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:41740) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <pj@HIDDEN>) id 1imloi-0006aO-S5 for bug-grep@HIDDEN; Wed, 01 Jan 2020 16:46:14 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_LOW, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <pj@HIDDEN>) id 1imloh-0007Jc-Mw for bug-grep@HIDDEN; Wed, 01 Jan 2020 16:46:12 -0500 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:42797) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <pj@HIDDEN>) id 1imloh-0007J0-EU for bug-grep@HIDDEN; Wed, 01 Jan 2020 16:46:11 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id BB67E2234B for <bug-grep@HIDDEN>; Wed, 1 Jan 2020 16:46:10 -0500 (EST) Received: from imap34 ([10.202.2.84]) by compute1.internal (MEProxy); Wed, 01 Jan 2020 16:46:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=1lZVA+ i/aNbISUaTQxnlsayXO9m5ai4v70uzaoJnjf8=; b=h4D19IOsSFh+M6g73+sQnr QJG90tT+P2IiguwhZhb1Ft+nsk5aE/8bGTNpL3vOcKJspn2deBc/jEbiLX9Gp2qe DOzYXhVUH6OGVvHnIGulN9GUguvgqNfbt9UC5vqdkr6jLuXK9RyT6pyTrD38acU6 RmmdYhMOVi6F89BVZApfBhtsbiePo3ERZfNauGOEeGqpE5FQ6B7Rg6J42akfU7/J w3Fh5UZ2zPeBILfSh56hlaY69HAGwaI0GFb8iwZIrXhs6eTLJg1lyipZwV1jCn3i Y9KKzGRr89E2NV6ZnEELGqkL8mOJr0iUFhtq1e3AiDeHdd/SEiaFHOkJrvmyzuOQ == X-ME-Sender: <xms:IhMNXpWnRJ8_D0d6Q75r6NYCiOBc9wRyE33jvNlp5eppKTmzDWcT6A> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvdefledgudehvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvffutgesth dtredtreertdenucfhrhhomhepfdfrrghulhculfgrtghkshhonhdfuceophhjsehushgr rdhnvghtqeenucfrrghrrghmpehmrghilhhfrhhomhepphhjsehushgrrdhnvghtnecuve hluhhsthgvrhfuihiivgeptd X-ME-Proxy: <xmx:IhMNXhdD0kzmSybXt2dzC1gQokhSxjkTsH3J3riofFD4b23xIQrFBg> <xmx:IhMNXkyatBxRUL1f-XHhfNs5ux4dRQa1ZXLldBSWuZ-OEzSQZEBPdQ> <xmx:IhMNXkGC57X5lY8sT2gC5JPxg_gYNc_apfsbQvWFBF2Xunw9FkhrtQ> <xmx:IhMNXijfb0OrKNYVO-2PbTaHwlcr35ywIVbCFoifMjmbUVg0Xcisaw> Received: by mailuser.nyi.internal (Postfix, from userid 501) id 3B42A1460061; Wed, 1 Jan 2020 16:46:10 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.1.7-694-gd5bab98-fmstable-20191218v1 Mime-Version: 1.0 Message-Id: <a0744545-50e1-4e11-b200-2fac405c7260@HIDDEN> In-Reply-To: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> Date: Wed, 01 Jan 2020 15:45:54 -0600 From: "Paul Jackson" <pj@HIDDEN> To: bug-grep@HIDDEN Subject: Re: bug#32073: Improvements in Grep (Bug#32073) Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.27 X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.6 (--) From my old Unix fart view point, Paul (the other Paul) is herding a hundred GNU cats, small command line utilities, many of which date their origins back to the 1970's, many of which have over the years grown their own internal i/o routines with specific performance specializations, but few of which have much in the way of user customizable i/o blocking and read-ahead customizations. Except for the last decade, those commands spent almost their entire lives running off spinning rust platters, which grew (immensely) in size over the years, but which did not change much in other performance characteristics. Those commands are in general not well suited to adapting to provide maximally optimal performance across the recent generation of storage devices, with their much more varied performance characteristics. I'm guessing that Sergiu has some specific needs that it seems that grep meets, except that grep (like its hundred cat siblings) lacks the tunable i/o characteristics needed to get maximum performance across a rapidly evolving variety of these more recent kinds of storage. What I've done in situations such as I suspect Sergiu finds himself in is to code up a custom utility, that met my specific needs, when I had higher performance demands, while continuing to make extensive use of the general purpose classic Unix/Linux command line utilities that Paul E. now herds. I can't imagine that it would make sense to attempt to recode a hundred classic GNU utilities to each be intelligently adaptable goats/pigs/cats/dogs/cows/bison/... depending on the i/o terrain they were running on. Many many thanks to Paul E. for herding these cats all these many years. I hope my weird comments to not cause him even the slightest distress. (The word "cat" above refers to four legged felines, not to the concatenate command line utility.) -- Paul Jackson pj@HIDDEN
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 21:02:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 16:02:49 2020 Received: from localhost ([127.0.0.1]:37654 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1iml8j-0000lx-3d for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:02:49 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48738) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1iml8g-0000lf-Gi for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:02:47 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1D85D160052; Wed, 1 Jan 2020 13:02:39 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id td9dIdw_GCN9; Wed, 1 Jan 2020 13:02:38 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7A6AA160054; Wed, 1 Jan 2020 13:02:38 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id N3e1b83al3QG; Wed, 1 Jan 2020 13:02:38 -0800 (PST) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 51D6C160052; Wed, 1 Jan 2020 13:02:38 -0800 (PST) Subject: Re: bug#32073: Improvements in Grep (Bug#32073) To: Sergiu Hlihor <sh@HIDDEN> References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Message-ID: <0c596c01-3a43-2651-7de8-50d92ae195a4@HIDDEN> Date: Wed, 1 Jan 2020 13:02:38 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) On 1/1/20 12:04 PM, Sergiu Hlihor wrote: > That's why the best is to be application specific That doesn't mean that one should have to modify every application. One could instead modify the OS so that it uses different read-ahead heuristics for different classes of applications. This should be easier to manage than modifying every individual application.
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 20:24:36 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 15:24:36 2020 Received: from localhost ([127.0.0.1]:37619 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imkXj-0008Ir-VL for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:24:36 -0500 Received: from freefriends.org ([96.88.95.60]:49340) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1imkXi-0008Ik-D9 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:24:34 -0500 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (freefriends.org [96.88.95.60]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 001KOQ9E012802 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 1 Jan 2020 13:24:27 -0700 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 001KOQMn012801; Wed, 1 Jan 2020 13:24:26 -0700 From: arnold@HIDDEN Message-Id: <202001012024.001KOQMn012801@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Wed, 01 Jan 2020 13:24:26 -0700 To: sh@HIDDEN, arnold@HIDDEN Subject: Re: bug#32073: Improvements in Grep (Bug#32073) References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <202001011119.001BJMYA027994@HIDDEN> <CAD-3cdeVbf3TVwFyj7NFd5d5_gTXugTb8_=x9aTjGE4+ufHggQ@HIDDEN> In-Reply-To: <CAD-3cdeVbf3TVwFyj7NFd5d5_gTXugTb8_=x9aTjGE4+ufHggQ@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, eggert@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.9 (/) Hi. Sergiu Hlihor <sh@HIDDEN> wrote: > Arnold, there is no need to write user code, it is already done in > benchmarks. One of the standard benchmarks when testing HDDs and SSDs is > read throughput vs block size and at different queue depths. I think you're misunderstanding me, or I am misunderstanding you. As the gawk maintainer, I can choose the buffer size to use every time I issue a read(2) system call for any given input file. Gawk currently uses the smaller of (a) the file's size or (b) the st_blksize member of the struct stat array. If I understand you correctly, this is "not enough"; gawk (grep, cp, etc.) should all use an optimal buffer size that depends upon the underlying storage hardware where the file is located. So far, so good, except for: How do I determine what that number is? I cannot run a benchmark before opening each and every file. I don't know of a system call that will give me that number. (If there is, please point me to it.) Do you just want a command line option or environment variable that you, as the application user, can set? If the latter, it happens that gawk will let you set AWKBUFSIZE and it will use whatever number you supply for doing reads. (This is even documented.) HTH, Arnold
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 20:04:59 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 15:04:59 2020 Received: from localhost ([127.0.0.1]:37607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imkEk-0007qg-W0 for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:04:59 -0500 Received: from mail-il1-f174.google.com ([209.85.166.174]:38191) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <sh@HIDDEN>) id 1imkEi-0007qS-HO for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:04:57 -0500 Received: by mail-il1-f174.google.com with SMTP id f5so32700534ilq.5 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 12:04:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=discovergy-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=f3gPqw//sPzxPArZaLCn5qCkkS0muRBetlNjDRv9cFw=; b=PtSs/uSm5aVIUCaZXg5w2ZCQsGUa5lQcpOH77ANuNNf+2piUl9tePpfnUfa+N231b4 LN4/iPcDPDuxS0SIErtA/9cOBH/lAoggtTqhmsze0Itxtal1Q9rl/k8kp8VqGzZQpQob Ug/YVEttA1WULSbvtaLmx1SjBtb/oyt+GX5JZGxYNo9Ww3dc7YmUWz2t358Kk8eHku4n AAuP6kIkhOBQGZrqMzVe6dGCeElWKUgInkinYqpWWinD5gPCuIskIA2m6WHb/ZzWtYJO Bg3i9doIJ05U5BZhHYJqmkAV0+RhRClx2oYc0GcSnvtQFY0w8BnZ0HwT6ojKsICI+GOj Npwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=f3gPqw//sPzxPArZaLCn5qCkkS0muRBetlNjDRv9cFw=; b=TTMmEUMp/2qYER9W7/OH0NJfNkVCjbQ93a4SZ7lKU/VMOUdrH4ntOlQ9Amyk0MU/v1 3RsWIXbLs3d5Bvod84nNtN3oRc6770kVemblTN0zGh591o2vySDfEU7lFqo/SN++ugiw BFq+RXTDSXQdUrhRnmBlhWSeWncdp2Zwyye0U5zGxiT7oc7gzH9rck9fxd7lIUXd5zV5 qYdiLaSYKrJKhjB0ursaf6rybkB+EzQntUFGQodz0ImJBmSAGPvVKXbP4gEKmsEpz3ZQ KWsWsQ+MbGam9m5Lz5hvB3M1Nk7epL+P+v4dWFHy7xqAqF3gcW/QJnnkNCea6nvBcBot uwzg== X-Gm-Message-State: APjAAAUfpXpEeR5HZA5G+rM0bsnSPtOv+/39pEfYmiBjqtAWV0/L+AwN QTe0uc5cRMdRzLiUpQyGFaZaG5fYHUKEpK6C02kW/Q== X-Google-Smtp-Source: APXvYqyXJkilmFlV4mF5TVSrUwHwx6bE4fHBFq6X4gjgNZsVhsoYnsx/3nyr/iAeOnF8GKBQd+dW7wxgbdEBWGsQysk= X-Received: by 2002:a92:ce09:: with SMTP id b9mr64895585ilo.219.1577909091082; Wed, 01 Jan 2020 12:04:51 -0800 (PST) MIME-Version: 1.0 References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> In-Reply-To: <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> From: Sergiu Hlihor <sh@HIDDEN> Date: Wed, 1 Jan 2020 21:04:39 +0100 Message-ID: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN> Subject: Re: Improvements in Grep (Bug#32073) To: Paul Eggert <eggert@HIDDEN> Content-Type: multipart/alternative; boundary="000000000000dcbab1059b199639" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>, Jim Meyering <jim@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --000000000000dcbab1059b199639 Content-Type: text/plain; charset="UTF-8" Paul, I have to correct you. On a production server you have usually a mix of applications many times including databases. For databases, having a read ahead means one IO less since usually database access patterns are random reads. Here actually best is to disable completely read ahead. In fact, I do have to say that probably best is to disable completely read ahead and let applications deal with it, either in an automatic fashion, like reading the optimal IO block size from device or in a configurable way with defaults good enough for today's servers. If you now configure the OS to do a read ahead hitting all HDDs then you induce potentially unnecessary IO load for all applications which use it, which when having HDDs is totally unacceptable. That's why the best is to be application specific and ideally configured to use optimal IO block size. So no, letting OS to do it is stupid. On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@HIDDEN> wrote: > On 1/1/20 1:15 AM, Sergiu Hlihor wrote: > > If you rely on OS, then > > you are at the mercy of whatever read ahead configuration you have. > > Right, and whatever changes you make to the OS and its read-ahead > configuration > will work for all applications, not just for 'grep'. So, change the OS to > do > that. There shouldn't be a need to change 'grep' in particular (or 'cp' in > particular, or 'awk' in particular, etc.). > > > The issue of large > > block sizes for IO operations is widespread across all tools from Linux, > > like rsync or cp and its only getting worse > > Quite right. And it would be painful to have to modify all those tools, > and to > maintain those modifications. So modify the OS instead. Scheduling > read-ahead is > really the OS's job anyway. > --000000000000dcbab1059b199639 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Paul, I have to correct you. On a production server y= ou have usually a mix of applications many times including databases. For d= atabases, having a read ahead means one IO less since usually database acce= ss patterns are random reads. Here actually best is to disable completely r= ead ahead. In fact, I do have to say that probably best is to disable compl= etely read ahead and let applications deal with it, either in an automatic = fashion, like reading the optimal IO block size from device=C2=A0 or in a c= onfigurable way with defaults good enough for today's servers. If you n= ow configure the OS to do a read ahead hitting all HDDs then you induce pot= entially unnecessary IO load for all applications which use it, which when = having HDDs is totally unacceptable. That's why the best is to be appli= cation specific and ideally configured to use optimal IO block size.</div><= div><br></div><div>So no, letting OS to do it is stupid.<br></div><br><div = class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, 1 Jan 2= 020 at 20:42, Paul Eggert <<a href=3D"mailto:eggert@HIDDEN" target= =3D"_blank">eggert@HIDDEN</a>> wrote:<br></div><blockquote class=3D= "gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2= 04,204,204);padding-left:1ex">On 1/1/20 1:15 AM, Sergiu Hlihor wrote:<br> > If you rely on OS, then<br> > you are at the mercy of whatever read ahead configuration you have.<br= > <br> Right, and whatever changes you make to the OS and its read-ahead configura= tion<br> will work for all applications, not just for 'grep'. So, change the= OS to do<br> that. There shouldn't be a need to change 'grep' in particular = (or 'cp' in<br> particular, or 'awk' in particular, etc.).<br> <br> > The issue of large<br> > block sizes for IO operations is widespread across all tools from Linu= x,<br> > like rsync or cp and its only getting worse<br> <br> Quite right. And it would be painful to have to modify all those tools, and= to<br> maintain those modifications. So modify the OS instead. Scheduling read-ahe= ad is<br> really the OS's job anyway.<br> </blockquote></div><br></div> --000000000000dcbab1059b199639--
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 19:43:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 14:43:04 2020 Received: from localhost ([127.0.0.1]:37595 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imjtY-0007LK-0Q for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:43:04 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:43186) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1imjtV-0007Kk-3q for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:43:01 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BF9D2160052; Wed, 1 Jan 2020 11:42:54 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id xp1ZcUe4sLgB; Wed, 1 Jan 2020 11:42:54 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1B474160054; Wed, 1 Jan 2020 11:42:54 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id K68Jkv66INS6; Wed, 1 Jan 2020 11:42:54 -0800 (PST) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E1E10160052; Wed, 1 Jan 2020 11:42:53 -0800 (PST) Subject: Re: Improvements in Grep (Bug#32073) To: Sergiu Hlihor <sh@HIDDEN> References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Message-ID: <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN> Date: Wed, 1 Jan 2020 11:42:53 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>, Jim Meyering <jim@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) On 1/1/20 1:15 AM, Sergiu Hlihor wrote: > If you rely on OS, then > you are at the mercy of whatever read ahead configuration you have. Right, and whatever changes you make to the OS and its read-ahead configuration will work for all applications, not just for 'grep'. So, change the OS to do that. There shouldn't be a need to change 'grep' in particular (or 'cp' in particular, or 'awk' in particular, etc.). > The issue of large > block sizes for IO operations is widespread across all tools from Linux, > like rsync or cp and its only getting worse Quite right. And it would be painful to have to modify all those tools, and to maintain those modifications. So modify the OS instead. Scheduling read-ahead is really the OS's job anyway.
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 19:07:11 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 14:07:11 2020 Received: from localhost ([127.0.0.1]:37583 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imjKo-0006V4-Aa for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:07:11 -0500 Received: from mail-il1-f169.google.com ([209.85.166.169]:47082) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <sh@HIDDEN>) id 1imjKm-0006Us-9C for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:07:09 -0500 Received: by mail-il1-f169.google.com with SMTP id t17so32599947ilm.13 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 11:07:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=discovergy-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GE5BFtR8fRKW8SyVt2Lsf4LSkbRzd93/7WiEMgl11gY=; b=wLcj4eRokODQL0JaGa9c6cAKCu7JfsifRZmzw4C7SXX44Gq6qvCLR3b4mAPh/l+UMS VJtMKBnP5BTRlNwsYtNlGgi0CSXPRFTsIAkfQ8lrqBEEW1IfX7uEBCmL3CF28vSbeB// gMcALYDiBGg853Ma2cuTs5epE4zWXpYU+giu6yabLP2U63D37ERXXON9PRheQS7ZyXKZ 6nkO1Ke1MiyBHx3cx0unMYYEeesQLZOIQQJjXN9XP5ZDvrpTwC+NrPBpHKGIRWeA8YBn CgjZajjSxDV3mC4V4zrr8EMB4tYv4y5VebZ6EISUtPKNZ77r2Sx10pP5/x1fBHnWE7rh jt4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GE5BFtR8fRKW8SyVt2Lsf4LSkbRzd93/7WiEMgl11gY=; b=Io4LA7W5lP/3oOKN9v6oTszdXzyWwiM5t8cwr8UNetOImddPwsWACTUZrDxz3fa5GU EvVQab9f6IGIRlGqhJ5QgSIDY3iwqZUhDnIWaXL24kLGrdj1LDnwD4kX9sWrB7zrKf5x q7iypdVIKlpVpcpgPCDqHGodSsecsmwq6lZyMGLeTojrFImwqK81vFr8MXND06UDWmQJ pjHMBEeX9tqpOHVNX+gh4CyXErHgdsWHmQLrlFMcvDoVZpAGSgzKbCGaVrlomgO3crNy EWkY9N18muh4DfbXmS+g3jqh77DvrRB9kSIWnqUkwMjw3r34Z5k2XV8H6QHU/QbH5MLz 5KxA== X-Gm-Message-State: APjAAAW/eUaiM5HWABC1RF84tL87+fcjejLYjs9oxjD1Fqozy87K0EPi w644Ffmtoe5cEW6dGBPBPeLkhsfcsiR5P3pbCfudHmTb2MQ= X-Google-Smtp-Source: APXvYqz3EOZRxGhoBIXUl1bc21QJ1eX+eLgDIJhaZSMVcJq8KpODZtuA8VFHuuYcMl86wEhSgL1v7c3L5rMyGkbvsAc= X-Received: by 2002:a92:2804:: with SMTP id l4mr66440415ilf.136.1577905622626; Wed, 01 Jan 2020 11:07:02 -0800 (PST) MIME-Version: 1.0 References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> <202001011119.001BJMYA027994@HIDDEN> In-Reply-To: <202001011119.001BJMYA027994@HIDDEN> From: Sergiu Hlihor <sh@HIDDEN> Date: Wed, 1 Jan 2020 20:06:39 +0100 Message-ID: <CAD-3cdeVbf3TVwFyj7NFd5d5_gTXugTb8_=x9aTjGE4+ufHggQ@HIDDEN> Subject: Re: bug#32073: Improvements in Grep (Bug#32073) To: arnold@HIDDEN Content-Type: multipart/alternative; boundary="000000000000204a27059b18c80b" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --000000000000204a27059b18c80b Content-Type: text/plain; charset="UTF-8" Arnold, there is no need to write user code, it is already done in benchmarks. One of the standard benchmarks when testing HDDs and SSDs is read throughput vs block size and at different queue depths. Take a look at this" https://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-1TB-ATTO.jpg . In this benchmark, at queue depth 4 and 128KB block size, the SSD was not yet able to achieve the maximum throughput 5GB/s. Moreover, if you extrapolate the results, to a queue depth of 1, you get about ~1.2GB/s out of over 5GB/s theoretical. Therefore for this particular model you need to issue read requests at minimum 512KB block size to achieve maximum throughput. With hard drives I already explained the issue. I have a production server where the HDD RAID array can do theoretically 2.5GB/s and I see read speeds over 500MB/s sustained when large block sizes are used for reads, yet when I use grep, I have a practical bandwidth of 20 to 50 MB/s. Moreover, when it comes to HDDs the math is quite simple and here it is for a standard HDD at 7200 RPM, 240MB/s: 7200 RPM => 120 revolutions per second 240 MB/s at 120 revolutions => 2MB per revolution One revolution time = 1000/120 => 8,33 ms Read throughput per ms = 240KB Worst case scenario: each read request requires a full revolution to reach to the data (head positioning is done concurrently and this can be ignored). Seek time: 8.33ms At 96KB: - Read time: 0.4ms - Total read latency = 8.33 + 0.4 = 8.73ms, read throughput = 1000 / 8.73 * 96KB = 11MB/s At 512KB: - Read time: 2.3ms - Total read latency = 8.33 + 2.3 = 10.63ms, read throughput = 1000 / 10.63 * 512KB = 48MB/s In practice average seek latencies are 4.16ms so throughput is double. This is the cold hard reality. In practice, when each one of you is testing, you are very likely deceived by testing on *one hdd, on an idle system* where you don't have anything else consuming IO in background like a database. In such an ideal scenario you do see 240MB/s because HDDs do also read ahead and by the time the data is transferred over interface and consumed, next chuck is in the buffer and can be delivered with apparent 0 seek time. This means first read takes 4ms, next ones takes 0.1ms. With a* HDD RAID array on a server where your IO is always at 50% load*, if you have a strip size of 128KB or more, you are hitting one drive at a time, each one with a penalty of 4.16ms. And due to constant load, by the time you hit the first hdd again, the read ahead buffer maintained by the HDD itself is also discarded, so all reads go directly to physical medium. If however you hit all HDDs at the same time, you will benefit from the read ahead from the HDD for at least one or more cycles thus having reads with apparent 0 latency and a way higher average bandwidth. The cost of reading from all HDDs at the same time is a potential of adding extra latencies for all other applications running, this is why the value should be configurable, such that best value can be setup based on hardware. The issue of large block sizes for IO operations is widespread across all tools from Linux, like rsync or cp and its only getting worse, to an extend where in my company we are considering writing our own tools for something that should have worked out of the box. One side issue, which I have to mention as I'm not aware of implementation details: as we are getting in GB/s territory, read is best done within it's own thread which then serves the output to the processing thread. With SSDs that can do multi GB/s this matters. On Wed, 1 Jan 2020 at 12:19, <arnold@HIDDEN> wrote: > As a quite serious question, how is someone writing user-level code > supposed to be able to figure out the right buffer size for a particular > file, and to do so portably? ("Show me the code.") > > Gawk bases its reads on the st_blksize member in struct stat. That will > typically be something like 4K - not nearly enough, given your description > below. > > Arnold > > Sergiu Hlihor <sh@HIDDEN> wrote: > > > This topic is getting more and more frustrating. If you rely on OS, then > > you are at the mercy of whatever read ahead configuration you have. And > > read ahead is typically 128KB so does not help that much. A HDD RAID 10 > > array with 12 disks and a strip size of 128KB reaches the maximum read > > throughput if read block size is 6 * 128 = 768KB. When issuing read > > requests with 128KB , you only hit one HDD, having 1/6 read throughput. > > With flash the same. A state of the art SSD that can do 5GB/s reads can > > actually do around 1GB/s or less at 128KB block size. Why is so hard to > > understand how hardware works and the fact that you need huge block sizes > > to actually read at full speed? Why not just exposing the read buffer > size > > as a configurable parameter, then anyone can just tune it as needed? 96KB > > is purely retarded. > > > > On Wed, 1 Jan 2020 at 08:52, Paul Eggert <eggert@HIDDEN> wrote: > > > > > > This makes me think we should follow Coreutils' lead[0] and increase > > > > grep's initial buffer size from 32KiB, probably to 128KiB. > > > > > > I see that Jim later installed a patch increasing it to 96 KiB. > > > > > > Whatever number is chosen, it's "wrong" for some configuration. And I > > > suppose > > > the particular configuration that Sergiu Hlihor mentioned could be > tweaked > > > so > > > that it worked better with grep (and with other programs). > > > > > > I'm inclined to mark this bug report as a wishlist item, in the sense > that > > > it'd > > > be nice if grep and/or the OS could pick buffer sizes more > intelligently > > > (though > > > it's not clear how grep and/or the OS could go about this). > > > > --000000000000204a27059b18c80b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div>Arnold, there is no need to write us= er code, it is already done in benchmarks. One of the standard benchmarks w= hen testing HDDs and SSDs is read throughput vs block size and at different= queue depths.=C2=A0 Take a look at this" <a href=3D"https://www.serve= thehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-1TB-ATTO.jpg">ht= tps://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-1= TB-ATTO.jpg</a> . In this benchmark, at queue depth 4 and 128KB block size,= the SSD was not yet able to achieve the maximum throughput 5GB/s. Moreover= , if you extrapolate the results, to a queue depth of 1, you get about ~1.2= GB/s out of over 5GB/s theoretical. Therefore for this particular model you= need to issue read requests at minimum 512KB block size to achieve maximum= throughput. With hard drives I already explained the issue. I have a produ= ction server where the HDD RAID array can do theoretically 2.5GB/s and I se= e read speeds over 500MB/s sustained when large block sizes are used for re= ads, yet when I use grep, I have a practical bandwidth of 20 to 50 MB/s. Mo= reover, when it comes to HDDs the math is quite simple and here it is for a= standard HDD at 7200 RPM, 240MB/s:</div><div>7200 RPM =3D> 120 revoluti= ons per second <br></div><div>240 MB/s at 120 revolutions =3D> 2MB per r= evolution</div><div>One revolution time=C2=A0 =3D 1000/120 =3D> 8,33 ms<= /div><div>Read throughput per ms =3D 240KB</div><div><br></div><div>Worst c= ase scenario: each read request requires a full revolution to reach to the = data (head positioning is done concurrently and this can be ignored). <br><= /div><div></div><div>Seek time: 8.33ms</div><div></div><div>At 96KB:<br></d= iv><div>=C2=A0- Read time: 0.4ms</div><div>=C2=A0- Total read latency=C2=A0= =3D 8.33 + 0.4 =3D 8.73ms, read throughput=C2=A0 =3D 1000 / 8.73 * 96KB = =3D 11MB/s</div><div></div><div>At 512KB:</div><div>=C2=A0- Read time: 2.3m= s</div><div>=C2=A0- Total read latency =3D 8.33 + 2.3 =3D 10.63ms, read thr= oughput=C2=A0 =3D 1000 / 10.63 * 512KB =3D 48MB/s</div><div>In practice ave= rage seek latencies are 4.16ms so throughput is double. This is the cold ha= rd reality. In practice, when each one of you is testing, you are very like= ly deceived by testing on <b>one hdd, on an idle system</b> where you don&#= 39;t have anything else consuming IO in background like a database. In such= an ideal scenario you do see 240MB/s because HDDs do also read ahead and b= y the time the data is transferred over interface and consumed, next chuck = is in the buffer and can be delivered with apparent 0 seek time. This means= first read takes 4ms, next ones takes 0.1ms. With a<b> HDD RAID array on a= server where your IO is always at 50% load</b>, if you have a strip size o= f 128KB or more, you are hitting one drive at a time, each one with a penal= ty of 4.16ms. And due to constant load, by the time you hit the first hdd a= gain, the read ahead buffer maintained by the HDD itself is also discarded,= so all reads go directly to physical medium. If however you hit all HDDs a= t the same time, you will benefit from the read ahead from the HDD for at l= east one or more cycles thus having reads with apparent 0 latency and a way= higher average bandwidth. The cost of reading from all HDDs at the same ti= me is a potential of adding extra latencies for all other applications runn= ing, this is why the value should be configurable, such that best value can= be setup based on hardware. The issue of large block sizes for IO operatio= ns is widespread across all tools from Linux, like rsync or cp and its only= getting worse, to an extend where in my company we are considering writing= our own tools for something that should have worked out of the box. One si= de issue, which I have to mention as I'm not aware of implementation de= tails: as we are getting in GB/s territory, read is best done within it'= ;s own thread which then serves the output to the processing thread. With S= SDs that can do multi GB/s this matters.<br></div><div><br></div><div><br><= /div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" c= lass=3D"gmail_attr">On Wed, 1 Jan 2020 at 12:19, <<a href=3D"mailto:arno= ld@HIDDEN">arnold@HIDDEN</a>> wrote:<br></div><blockquote class= =3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg= b(204,204,204);padding-left:1ex">As a quite serious question, how is someon= e writing user-level code<br> supposed to be able to figure out the right buffer size for a particular<br= > file, and to do so portably? ("Show me the code.")<br> <br> Gawk bases its reads on the st_blksize member in struct stat.=C2=A0 That wi= ll<br> typically be something like 4K - not nearly enough, given your description<= br> below.<br> <br> Arnold<br> <br> Sergiu Hlihor <<a href=3D"mailto:sh@HIDDEN" target=3D"_blank">sh= @discovergy.com</a>> wrote:<br> <br> > This topic is getting more and more frustrating. If you rely on OS, th= en<br> > you are at the mercy of whatever read ahead configuration you have. An= d<br> > read ahead is typically 128KB so does not help that much. A HDD RAID 1= 0<br> > array with 12 disks and a strip size of 128KB reaches the maximum read= <br> > throughput if read block size is 6 * 128 =3D 768KB. When issuing read<= br> > requests with 128KB , you only hit one HDD, having 1/6 read throughput= .<br> > With flash the same. A state of the art SSD that can do 5GB/s reads ca= n<br> > actually do around 1GB/s or less at 128KB block size. Why is so hard t= o<br> > understand how hardware works and the fact that you need huge block si= zes<br> > to actually read at full speed? Why not just exposing the read buffer = size<br> > as a configurable parameter, then anyone can just tune it as needed? 9= 6KB<br> > is purely retarded.<br> ><br> > On Wed, 1 Jan 2020 at 08:52, Paul Eggert <<a href=3D"mailto:eggert@= cs.ucla.edu" target=3D"_blank">eggert@HIDDEN</a>> wrote:<br> ><br> > > > This makes me think we should follow Coreutils' lead[0] = and increase<br> > > > grep's initial buffer size from 32KiB, probably to 128Ki= B.<br> > ><br> > > I see that Jim later installed a patch increasing it to 96 KiB.<b= r> > ><br> > > Whatever number is chosen, it's "wrong" for some co= nfiguration. And I<br> > > suppose<br> > > the particular configuration that Sergiu Hlihor mentioned could b= e tweaked<br> > > so<br> > > that it worked better with grep (and with other programs).<br> > ><br> > > I'm inclined to mark this bug report as a wishlist item, in t= he sense that<br> > > it'd<br> > > be nice if grep and/or the OS could pick buffer sizes more intell= igently<br> > > (though<br> > > it's not clear how grep and/or the OS could go about this).<b= r> > ><br> </blockquote></div></div> --000000000000204a27059b18c80b--
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 1 Jan 2020 11:27:57 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 06:27:57 2020 Received: from localhost ([127.0.0.1]:35689 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imcAO-00077b-V7 for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:27:57 -0500 Received: from lists.gnu.org ([209.51.188.17]:46445) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <pj@HIDDEN>) id 1imcAM-00077P-VV for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:27:55 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:37966) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <pj@HIDDEN>) id 1imcAL-0007y8-Jv for bug-grep@HIDDEN; Wed, 01 Jan 2020 06:27:54 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_LOW, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <pj@HIDDEN>) id 1imcAK-0000fx-G8 for bug-grep@HIDDEN; Wed, 01 Jan 2020 06:27:53 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:53503) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <pj@HIDDEN>) id 1imcAK-0000cX-6N for bug-grep@HIDDEN; Wed, 01 Jan 2020 06:27:52 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 2567A44F for <bug-grep@HIDDEN>; Wed, 1 Jan 2020 06:27:50 -0500 (EST) Received: from imap34 ([10.202.2.84]) by compute1.internal (MEProxy); Wed, 01 Jan 2020 06:27:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=3FPE13 sLv9H+a6dWQRcMgOBbn4EKJMJWiX4CmxajVgQ=; b=jFIOhRXG5TxSZfp8sSbsYf atLO6F0EBVwJYVgqpV/PMbFcbDL2NxxGv61We/kSEGFAmWgRqA528MvU6sUnVs8J tUU/yq2kUq9SJZy7FfUvbF/mBFZnM5y48hEeE0I60qKPmHxr7Tf1MhLOKeK6Tf+9 LdVh4fZq+LDjbe5BaJBcteOMUids9+LWeT1wh8J+kyeqKDQc3mSf6KPmGqYcCC1Z xlVDjql840uOD33Dc3hNGLwGBYm/6AWbDmRwXArH8EwTQQHfopWf5YdQ5qW64AVL mySB2nVL/IFaWGISNxvBNej/1ervduOtlMel4YIJLSH0+BFKRdP1S16dwKeJ8kyQ == X-ME-Sender: <xms:NYIMXhbhPRcCmdkpGXxnM6RFlGNP8sT50ATVRU7DjQeMQw--sWuJVA> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvdefledgvdejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedfrfgruhhlucflrggtkhhsohhnfdcuoehpjhesuhhsrgdr nhgvtheqnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpjhesuhhsrgdrnhgvthenucevlh hushhtvghrufhiiigvpedt X-ME-Proxy: <xmx:NYIMXt65eVy_B0YMmKgbR9ZMpE1FfGJx45Mfq72ns2w4jX0V93Pwng> <xmx:NYIMXsQulPtDXZFUjlULVZMQ22vjAKn40HcUqhy6qNiWiDVjtK4frA> <xmx:NYIMXtvAgvB-piXUe9Bc9YgW4mbhfQ7I9zZRwfGAs8OrOXYtjuEo5Q> <xmx:NYIMXqF_ko3gryJj2-oPXJOL3Zz4bgKTGQkQzqkrJMyG2EIbjX2KXg> Received: by mailuser.nyi.internal (Postfix, from userid 501) id 5E5C11460061; Wed, 1 Jan 2020 06:27:49 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.1.7-694-gd5bab98-fmstable-20191218v1 Mime-Version: 1.0 Message-Id: <a59adc1e-64af-44bd-b3aa-8821a7fe354b@HIDDEN> In-Reply-To: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> Date: Wed, 01 Jan 2020 05:26:04 -0600 From: "Paul Jackson" <pj@HIDDEN> To: bug-grep@HIDDEN Subject: Re: bug#32073: Improvements in Grep (Bug#32073) Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.123.25 X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.6 (--) >> Why not just exposing the read buffer size as a configurable parameter ... Take a look at the (and I quote) "Hairy buffering mechanism for grep" input buffering code in the grep source file grep-3.3/src/grep.c, then you tell me why it's not a runtime variable parameter <grin>. In other words, the input (and output) i/o buffering and performance tuning for various situations and kinds of files has been tuned and refined over many years. Doing something to the code, such as making buffer size a run time adjustable parameter, would probably not be easy, would risk making one usage of grep slower in order to make some other usage faster, and would risk some nasty bugs. -- Paul Jackson pj@HIDDEN
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 11:19:34 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 06:19:34 2020 Received: from localhost ([127.0.0.1]:35683 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imc2H-0006qy-PG for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:19:34 -0500 Received: from freefriends.org ([96.88.95.60]:44578) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1imc2F-0006qq-G8 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:19:32 -0500 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (freefriends.org [96.88.95.60]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 001BJN5u027995 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 1 Jan 2020 04:19:23 -0700 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 001BJMYA027994; Wed, 1 Jan 2020 04:19:22 -0700 From: arnold@HIDDEN Message-Id: <202001011119.001BJMYA027994@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Wed, 01 Jan 2020 04:19:22 -0700 To: sh@HIDDEN, eggert@HIDDEN Subject: Re: bug#32073: Improvements in Grep (Bug#32073) References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> In-Reply-To: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.9 (/) As a quite serious question, how is someone writing user-level code supposed to be able to figure out the right buffer size for a particular file, and to do so portably? ("Show me the code.") Gawk bases its reads on the st_blksize member in struct stat. That will typically be something like 4K - not nearly enough, given your description below. Arnold Sergiu Hlihor <sh@HIDDEN> wrote: > This topic is getting more and more frustrating. If you rely on OS, then > you are at the mercy of whatever read ahead configuration you have. And > read ahead is typically 128KB so does not help that much. A HDD RAID 10 > array with 12 disks and a strip size of 128KB reaches the maximum read > throughput if read block size is 6 * 128 = 768KB. When issuing read > requests with 128KB , you only hit one HDD, having 1/6 read throughput. > With flash the same. A state of the art SSD that can do 5GB/s reads can > actually do around 1GB/s or less at 128KB block size. Why is so hard to > understand how hardware works and the fact that you need huge block sizes > to actually read at full speed? Why not just exposing the read buffer size > as a configurable parameter, then anyone can just tune it as needed? 96KB > is purely retarded. > > On Wed, 1 Jan 2020 at 08:52, Paul Eggert <eggert@HIDDEN> wrote: > > > > This makes me think we should follow Coreutils' lead[0] and increase > > > grep's initial buffer size from 32KiB, probably to 128KiB. > > > > I see that Jim later installed a patch increasing it to 96 KiB. > > > > Whatever number is chosen, it's "wrong" for some configuration. And I > > suppose > > the particular configuration that Sergiu Hlihor mentioned could be tweaked > > so > > that it worked better with grep (and with other programs). > > > > I'm inclined to mark this bug report as a wishlist item, in the sense that > > it'd > > be nice if grep and/or the OS could pick buffer sizes more intelligently > > (though > > it's not clear how grep and/or the OS could go about this). > >
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 09:15:37 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 04:15:37 2020 Received: from localhost ([127.0.0.1]:35621 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ima6K-0003yV-Uc for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 04:15:37 -0500 Received: from mail-io1-f50.google.com ([209.85.166.50]:34884) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <sh@HIDDEN>) id 1ima6I-0003yF-QY for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 04:15:36 -0500 Received: by mail-io1-f50.google.com with SMTP id v18so35842758iol.2 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 01:15:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=discovergy-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8gHSYNXJGtNZ3y8e9Nw8xBL74BT4jrQGosamVRHAPwE=; b=RHY9/EZNzOLQbXdM8yE7Em+XryBaYVTCsL6kzppIApUrQBPapaZIJ+YLRTFKJayFQ5 zmrCvfR4WiNuREW6XOV3bU590JIE3dcFucwcjYuHFQRB3vsA7728et+Xkxfz3I+JinAj kUWosCOKB+hgpJLZfYI5V/GS3pE6lgfqgDmYtR0ywh4e7yMcdCV7ar1YzcggMSnC0qjl 41d03g7n5dWawEmvqedFvgX0njyaojVViK7++X+q43XLrSvMC2GzLay8RHiLdE+BLir9 1jxuH13Y+oBsiqwA+wk5X/cxjdqXbvYm685Yyr0QIjaxIVx4ScPaJPZ0AmWLG7Lj6/7W yf6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8gHSYNXJGtNZ3y8e9Nw8xBL74BT4jrQGosamVRHAPwE=; b=a3vUjVYpkkDSnTuJg6qU2hPm2rzPsoe28rpKH9L15sRe78vtWhFXzOxdZgb8kchlH+ 4fRWCRRUR4pjAMNGUq1o+ZRG4N06jmApjKC3b3lafFsk5VIb6S+or5V+xIljQcLwF9EF kw1jnOf3gs4qjTFOG7LZHcWY8mtgmef01YYJ4fhj4AwhkY2lJRdoaorZnf8xS4H8/s83 pppQgvZCmA5J8QSKcnMLaU2/80k2rAvVjwa+vB5gABKR6c8pGXxzVxysyUb4DkyZPtR3 ww4/tJljbviR29fqVNTARspTgTpLGWwhbuuhKx1ZdFF+aisvS/Z3kN+LQQfbxIIaEOY6 llBQ== X-Gm-Message-State: APjAAAVudOcOVwHsECbWShBB7sRiFc0qJqC5DetlwIBP1zWr62tZiCBq A9No8KAFnKOj9/qQRQabPR7sLP3AdHjtW0WrYIRhYA== X-Google-Smtp-Source: APXvYqyqv9lqodjCG3Jkez2qtpMRbIhjIlGYB6bhFu6so69gdZ5KPSB8/uvgzj4GZl/qdlbNcZmgcB8fLcP/2+wj+38= X-Received: by 2002:a02:864b:: with SMTP id e69mr58953496jai.83.1577870129071; Wed, 01 Jan 2020 01:15:29 -0800 (PST) MIME-Version: 1.0 References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> In-Reply-To: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> From: Sergiu Hlihor <sh@HIDDEN> Date: Wed, 1 Jan 2020 10:15:16 +0100 Message-ID: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN> Subject: Re: Improvements in Grep (Bug#32073) To: Paul Eggert <eggert@HIDDEN> Content-Type: multipart/alternative; boundary="0000000000008b9f5e059b1084ee" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>, Jim Meyering <jim@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --0000000000008b9f5e059b1084ee Content-Type: text/plain; charset="UTF-8" This topic is getting more and more frustrating. If you rely on OS, then you are at the mercy of whatever read ahead configuration you have. And read ahead is typically 128KB so does not help that much. A HDD RAID 10 array with 12 disks and a strip size of 128KB reaches the maximum read throughput if read block size is 6 * 128 = 768KB. When issuing read requests with 128KB , you only hit one HDD, having 1/6 read throughput. With flash the same. A state of the art SSD that can do 5GB/s reads can actually do around 1GB/s or less at 128KB block size. Why is so hard to understand how hardware works and the fact that you need huge block sizes to actually read at full speed? Why not just exposing the read buffer size as a configurable parameter, then anyone can just tune it as needed? 96KB is purely retarded. On Wed, 1 Jan 2020 at 08:52, Paul Eggert <eggert@HIDDEN> wrote: > > This makes me think we should follow Coreutils' lead[0] and increase > > grep's initial buffer size from 32KiB, probably to 128KiB. > > I see that Jim later installed a patch increasing it to 96 KiB. > > Whatever number is chosen, it's "wrong" for some configuration. And I > suppose > the particular configuration that Sergiu Hlihor mentioned could be tweaked > so > that it worked better with grep (and with other programs). > > I'm inclined to mark this bug report as a wishlist item, in the sense that > it'd > be nice if grep and/or the OS could pick buffer sizes more intelligently > (though > it's not clear how grep and/or the OS could go about this). > --0000000000008b9f5e059b1084ee Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>This topic is getting more and more frustrating. If y= ou rely on OS, then you are at the mercy of whatever read ahead configurati= on you have. And read ahead is typically 128KB so does not help that much. = A HDD RAID 10 array with 12 disks and a strip size of 128KB reaches the max= imum read throughput if read block size is 6 * 128 =3D 768KB. When issuing = read requests with 128KB , you only hit one HDD, having 1/6 read throughput= . With flash the same. A state of the art SSD that can do 5GB/s reads can a= ctually do around 1GB/s or less at 128KB block size. Why is so hard to unde= rstand how hardware works and the fact that you need huge block sizes to ac= tually read at full speed? Why not just exposing the read buffer size as a = configurable parameter, then anyone can just tune it as needed? 96KB is pur= ely retarded.<br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" clas= s=3D"gmail_attr">On Wed, 1 Jan 2020 at 08:52, Paul Eggert <<a href=3D"ma= ilto:eggert@HIDDEN">eggert@HIDDEN</a>> wrote:<br></div><blockq= uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1p= x solid rgb(204,204,204);padding-left:1ex">> This makes me think we shou= ld follow Coreutils' lead[0] and increase<br> > grep's initial buffer size from 32KiB, probably to 128KiB.<br> <br> I see that Jim later installed a patch increasing it to 96 KiB.<br> <br> Whatever number is chosen, it's "wrong" for some configuratio= n. And I suppose<br> the particular configuration that Sergiu Hlihor mentioned could be tweaked = so<br> that it worked better with grep (and with other programs).<br> <br> I'm inclined to mark this bug report as a wishlist item, in the sense t= hat it'd<br> be nice if grep and/or the OS could pick buffer sizes more intelligently (t= hough<br> it's not clear how grep and/or the OS could go about this).<br> </blockquote></div><br clear=3D"all"><br><br></div> --0000000000008b9f5e059b1084ee--
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 07:53:03 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 02:53:03 2020 Received: from localhost ([127.0.0.1]:35593 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1imYoR-000097-DS for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 02:53:03 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49318) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1imYoO-00008c-SA for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 02:53:01 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 4988716008F; Tue, 31 Dec 2019 23:52:55 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Gxmu9XNl4O-w; Tue, 31 Dec 2019 23:52:54 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9C3A716022A; Tue, 31 Dec 2019 23:52:54 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id EswyY5VL8zaA; Tue, 31 Dec 2019 23:52:54 -0800 (PST) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 6D60516008F; Tue, 31 Dec 2019 23:52:54 -0800 (PST) To: Sergiu Hlihor <sh@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: Re: Improvements in Grep (Bug#32073) Message-ID: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN> Date: Tue, 31 Dec 2019 23:52:54 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>, Jim Meyering <jim@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > This makes me think we should follow Coreutils' lead[0] and increase > grep's initial buffer size from 32KiB, probably to 128KiB. I see that Jim later installed a patch increasing it to 96 KiB. Whatever number is chosen, it's "wrong" for some configuration. And I suppose the particular configuration that Sergiu Hlihor mentioned could be tweaked so that it worked better with grep (and with other programs). I'm inclined to mark this bug report as a wishlist item, in the sense that it'd be nice if grep and/or the OS could pick buffer sizes more intelligently (though it's not clear how grep and/or the OS could go about this).
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 7 Jul 2018 01:39:13 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 21:39:13 2018 Received: from localhost ([127.0.0.1]:48957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1fbcBs-0000ed-OU for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 21:39:13 -0400 Received: from mail-it0-f49.google.com ([209.85.214.49]:54285) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <sh@HIDDEN>) id 1fbc4L-0000Sc-O0 for 32073 <at> debbugs.gnu.org; Fri, 06 Jul 2018 21:31:26 -0400 Received: by mail-it0-f49.google.com with SMTP id s7-v6so18707912itb.4 for <32073 <at> debbugs.gnu.org>; Fri, 06 Jul 2018 18:31:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=discovergy-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=moB9xXnht8Ivaa6H7SGXQS7xXcWeF8ztpA7QCKa2BT4=; b=bLqNZb71KPy2mG0vNyuyHEeRYm904p/g6KRsezoGV7fzUqdmYb+kf9BhNAAL2b3uNX EZS7Mkdk+wtgo787UcgZCPdzLsgB4Xx4XWz6+DdEV7GlXKDzCciLV+7xZf8CLThTVsqO ANycURMEcfIb8XOOKkywhequHiDPzuGjA+mCL8XbTQ85KlCtIy6Wi9m/UaH3DbF6MpQf m+iyBtopRtUMcO5vwaLX8jA5Z5mqzvW1z7TQrgzeOR6X0WaWp3964Rn0uRW3JU4i+nOR SUfDDvlxAM9Uv5rcCH6QXFHSKysTf6GQLABCezImn7rNgnnu0DfYsJlCAepPO/3DSwyB 2ajg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=moB9xXnht8Ivaa6H7SGXQS7xXcWeF8ztpA7QCKa2BT4=; b=TJo1fVsRWbp+azwzaVnH3qm9j43mqh06Jr6+A8x3WInMKafRmapHzxVSZe0wzqvWki 0BHrtABV03cUduLLrIAF7VuPO0JhbHPM1z/DW2MxrpbHcbdYc36CkcZ8w4anA9Ugdhy3 EFm07/0b5RWnq1A3UFDn/hkcc+jl+vx4NguDzsq2vr/pNcb65hBiVMFu5IAgsda6X6jE 5+KZz2OCcVXGBE18HKL5qXIIc+nQwy0shI3h/qVo4f4ccpy2rgk8jlhv9pW7g9/and0T kjMlgfihEsLTcKmyR7DTE3K+pwP9YRZXTXgj7eaWhds+NDExYxO3CW9tcHMFIo1+PVt7 /sHA== X-Gm-Message-State: APt69E25xAaOkhKgd8peujqWEJOl4JV0PGYTKYKkK2Fc+0CgdTE5G00N 74Sf2u9htI5ECihtHlTcygvRb72EZZ9LiVTCX+1yyw== X-Google-Smtp-Source: AAOMgpcEMTNY6K71KFEu+OBwvA3lpDLV0oMhfUmxaofiJR2wF4p/DaEshJq7+Vz+4+8CU1NGhLNdV/5thsc9LKdvNhY= X-Received: by 2002:a24:cf57:: with SMTP id y84-v6mr10031863itf.98.1530927080155; Fri, 06 Jul 2018 18:31:20 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:1b98:0:0:0:0:0 with HTTP; Fri, 6 Jul 2018 18:31:19 -0700 (PDT) In-Reply-To: <CA+8g5KFkFjPKLKLAeu8EiiU+pKsu89VKsvbRzc94_0xGShadZA@HIDDEN> References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> <CA+8g5KFkFjPKLKLAeu8EiiU+pKsu89VKsvbRzc94_0xGShadZA@HIDDEN> From: Sergiu Hlihor <sh@HIDDEN> Date: Sat, 7 Jul 2018 03:31:19 +0200 Message-ID: <CAD-3cdf6upYf6NjgFTZGHXbz6b-e6wCw+1A=LT8VMZxnK5q-6w@HIDDEN> Subject: Re: bug#32073: Improvements in Grep To: Jim Meyering <jim@HIDDEN> Content-Type: multipart/alternative; boundary="000000000000ca462d05705ebc23" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 32073 X-Mailman-Approved-At: Fri, 06 Jul 2018 21:39:11 -0400 Cc: 32073 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --000000000000ca462d05705ebc23 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To add, the increase to 128KiB is good, but for RAID arrays with light to medium load, this is not sufficient. In a system without any load, the HDD can read ahead and always serve the next request from buffer thus reading at full sequential speed of ~200MB/s . In a RAID 10 configuration with 12 hdds where strip size is set to 128KB, every HDD is hit at every 6th request. There is enough delay between reads hitting the same drive that the read ahead buffer often gets discarded which basically limits the throughput to max IOPS x buffer size =3D ~10-20MiB for 128KiB. I have such systems in production environments and I often see read speeds under 10MiB and read await >10ms which means that read ahead buffer is already discarded. At the same load conditions, if I read the data using utilities which can do 512KiB buffer size, I see read speed varying between 50 and 400MiB. Grep has an average CPU load of 2-3% of the given machine under such low reads, therefore it can do much more if reading is optimized= . On 7 July 2018 at 02:33, Jim Meyering <jim@HIDDEN> wrote: > On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <sh@HIDDEN> wrote: > > Hello, > > I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While > > grepping over large files I've noticed Grep is painfully slow. The > > bottleneck seems to be the read block which is extremely low (looks lik= e > > 64KB). For large files residing over big HDD RAID arrays, this request > > barely reaches one drive and based on CPU usage, grep is idling more or > > less. Given my tests for such scenarios, a read block size of at least > > 512KB would be way more efficient. It's very likely that optimum would = be > > 1MB+. Also, such increase in buffer size would also benefit slightly SS= Ds > > where maximum sequential throughput is usually achieved when reading at > > 256KB+ block size. > > If this is already possible in newer versions or configurable, I'd > > appreciate some hints about the new version which contains or about the > way > > I can configure it to increase the read block size. > > Thanks for raising the issue. > This makes me think we should follow Coreutils' lead[0] and increase > grep's initial buffer size from 32KiB, probably to 128KiB. I will time > with the attached diff on a few systems. > > [0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=3D > v8.22-103-g74ca6e84c > --=20 _____________________________________________ Senior Software Engineer & Team leader Telefon: +49 (0) 6221 7787-481 Email: sh@HIDDEN *Discovergy GmbH* _____________________________________________ Registergericht: Amtsgericht Aachen HRB 15391 Gesch=C3=A4ftsf=C3=BChrer: Ralf Esser | Bernhard Seidl | Nikolaus Starzache= r Diese E-Mail und eventuell angeh=C3=A4ngte Dateien sind nur f=C3=BCr den ob= en genannten Empf=C3=A4nger bestimmt und k=C3=B6nnen vertrauliche Informatione= n enthalten. Sollten Sie nicht der Empf=C3=A4nger sein, ist jede Verbreitung, Weiterleitung und Kopie verboten. Wenn Sie diese E-Mail versehentlich erhalten haben, senden Sie diese Mail zur=C3=BCck oder unterrichten umgehen= d den Absender unter oben genannten Kontaktdaten. Bitte l=C3=B6schen Sie diese Nachricht in diesem Fall umgehend. Vielen Dank. --000000000000ca462d05705ebc23 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>To add, the increase to 128KiB is good, but for RAID = arrays with light to medium load, this is not sufficient. In a system witho= ut any load, the HDD can read ahead and always serve the next request from = buffer thus reading at full sequential speed of ~200MB/s . In a RAID 10 con= figuration with 12 hdds where strip size is set to 128KB, every HDD is hit = at every 6th request. There is enough delay between reads hitting the same = drive that the read ahead buffer often gets discarded which basically limit= s the throughput to max IOPS x buffer size=C2=A0 =3D ~10-20MiB for 128KiB. = =C2=A0 <br></div><div>I have such systems in production environments and I = often see read speeds under 10MiB and read await >10ms which means that = read ahead buffer is already discarded. At the same load conditions, if I r= ead the data using utilities which can do 512KiB buffer size, I see read sp= eed varying between 50 and 400MiB. Grep has an average CPU load of 2-3% of = the given machine under such low reads, therefore it can do much more if re= ading is optimized.<br> </div></div><div class=3D"gmail_extra"><br><div cla= ss=3D"gmail_quote">On 7 July 2018 at 02:33, Jim Meyering <span dir=3D"ltr">= <<a href=3D"mailto:jim@HIDDEN" target=3D"_blank">jim@HIDDEN<= /a>></span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:= 0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Fri, Jul 6, 2018= at 9:26 AM, Sergiu Hlihor <<a href=3D"mailto:sh@HIDDEN">sh@disc= overgy.com</a>> wrote:<br> > Hello,<br> >=C2=A0 =C2=A0 =C2=A0 I'm using grep over Ubuntu Server 14.04 (Grep = version 2.16). While<br> > grepping over large files I've noticed Grep is painfully slow. The= <br> > bottleneck seems to be the read block which is extremely low (looks li= ke<br> > 64KB). For large files residing over big HDD RAID arrays, this request= <br> > barely reaches one drive and based on CPU usage, grep is idling more o= r<br> > less. Given my tests for such scenarios, a read block size of at least= <br> > 512KB would be way more efficient. It's very likely that optimum w= ould be<br> > 1MB+. Also, such increase in buffer size would also benefit slightly S= SDs<br> > where maximum sequential throughput is usually achieved when reading a= t<br> > 256KB+ block size.<br> >=C2=A0 =C2=A0 =C2=A0 If this is already possible in newer versions or c= onfigurable, I'd<br> > appreciate some hints about the new version which contains or about th= e way<br> > I can configure it to increase the read block size.<br> <br> Thanks for raising the issue.<br> This makes me think we should follow Coreutils' lead[0] and increase<br= > grep's initial buffer size from 32KiB, probably to 128KiB. I will time<= br> with the attached diff on a few systems.<br> <br> [0] <a href=3D"https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id= =3Dv8.22-103-g74ca6e84c" rel=3D"noreferrer" target=3D"_blank">https://git.s= avannah.gnu.org/<wbr>cgit/coreutils.git/commit/?id=3D<wbr>v8.22-103-g74ca6e= 84c</a><br> </blockquote></div><br><br clear=3D"all"><br>-- <br><div class=3D"gmail_sig= nature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir= =3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">= <div><div dir=3D"ltr">_____________________________________________<br><br>= Senior Software Engineer & Team leader<br><br>Telefon: +49 (0) 6221 778= 7-481<br> <br> Email: <a href=3D"mailto:sh@HIDDEN" target=3D"_blank"><span>sh@disc= overgy.com</span></a><br><br> <b><span style=3D"font-size:10.0pt;font-family:"Arial",sans-serif= ;color:#00b050">Discovergy GmbH</span></b><br>_____________________________= ________________<br><p style=3D"margin-right:0cm;margin-bottom:7.2pt;margin= -left:0cm;background:white;vertical-align:middle"><span style=3D"font-size:= 7.5pt;font-family:"Arial",sans-serif;color:#707173">Registergeric= ht: Amtsgericht Aachen HRB 15391</span><span style=3D"font-size:7.0pt;font-= family:"Arial",sans-serif;color:#222222"></span></p><p style=3D"m= argin-right:0cm;margin-bottom:4.8pt;margin-left:0cm"><span style=3D"font-si= ze:7.5pt;font-family:"Arial",sans-serif;color:#707173">Gesch=C3= =A4ftsf=C3=BChrer: Ralf Esser | Bernhard Seidl | Nikolaus Starzacher</span>= </p><span style=3D"font-size:18.0pt;font-family:Webdings;color:#00b050"></s= pan><span style=3D"font-size:10.0pt;font-family:"Arial",sans-seri= f;color:#00b050"></span><span style=3D"font-size:8.0pt;font-family:"Ar= ial","sans-serif";color:#5f5f5f">Diese E-Mail und eventuell angeh=C3=A4ngte Dateien sind nur f=C3=BCr den oben ge= nannten Empf=C3=A4nger bestimmt und k=C3=B6nnen vertrauliche Informationen enthalt= en.=20 Sollten Sie nicht der Empf=C3=A4nger sein, ist jede Verbreitung,=20 Weiterleitung und Kopie verboten. Wenn Sie diese E-Mail versehentlich=20 erhalten haben, senden Sie diese Mail zur=C3=BCck oder unterrichten umgehen= d=20 den Absender unter oben genannten Kontaktdaten. Bitte l=C3=B6schen Sie dies= e=20 Nachricht in diesem Fall umgehend. Vielen Dank.</span><br></div></div></div= ></div></div></div></div></div></div></div></div></div> </div> --000000000000ca462d05705ebc23--
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 7 Jul 2018 00:33:37 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 20:33:37 2018 Received: from localhost ([127.0.0.1]:48940 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1fbbAO-0007QZ-Ov for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 20:33:36 -0400 Received: from mail-wm0-f53.google.com ([74.125.82.53]:55493) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1fbbAN-0007QL-2l for 32073 <at> debbugs.gnu.org; Fri, 06 Jul 2018 20:33:35 -0400 Received: by mail-wm0-f53.google.com with SMTP id v16-v6so16251135wmv.5 for <32073 <at> debbugs.gnu.org>; Fri, 06 Jul 2018 17:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=Momz89FF7eOSa9Kgl6zghjC6oTftuhwl6RqoRaYOrMQ=; b=r5rTENOu+tSjTcdqBYCTdtODWfnwh8JkbnU5pvQQ4FKW1s8iXv2g5OCYUXzRzk8kV8 ODH33BK+AfAPwfMkzWeLWw5OCKnQNSHJIYsWa+w0kKz8gZobrqPJd8ed9itA2EkVtV5A iAXB+K+Pp/PxIRqXOJxVxKGnPNRuni/9L5iOidz9IVeVZwpsPFjhFNJVl9NBrwJu7s2d GjRngOzfufM+djuCXS4i5EEa6fucjxJz+8MVxCCaFNyLqOXfn3EezAAJYTsryJNZhmp7 kv4tiCYsRY7KlQw/J0XHBVZscFUrmBt+BWDTVOT2D/OJiVyNdnH/98srPilODCKG/+po w0aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=Momz89FF7eOSa9Kgl6zghjC6oTftuhwl6RqoRaYOrMQ=; b=D7WGy8TmAKpGlKrGsWtuPnKctHvaGrSGSoOvwgTpinuF9fzQnCbdkZ02dem92FhTwk 9ZGVfQuziRmvkc+VtxmXgv2FHObUpiW6RemoYLyxVNALbzXEJ572OG+/EE4py8kkSIaq PxI4NjXPPU+/L+w0hj/BScRtR4JQV23yOMc+61zQVoJZlOHojpssXYoxpwoeUS0G7F0w jQt/H5Qir8CGhByO5fizJhE+yKpo/9tVSpGaCs9xlg5SRimXpPjtXSHlZljpZeQhTsB9 7O1o7AsL0OGykTSOw8LVIrVoyTs2tJf+4WJzeZJaHPWGT7OY1dNW/ez2hQMED5Ky46z8 8JlA== X-Gm-Message-State: APt69E19y1bXDLv/AKiy97SERJPdA6/AVTIL7VXFPtIKgaw5r1Zx0uXA ui2EpCHA0XU0VxPqisyeegH6khO3gnlNFifrEhDg0g== X-Google-Smtp-Source: AAOMgpcRnfie1Sy2x6piy4b8g+uuIEn+uAsPVNH+5N8Gv+JNGmAcwNSJDDj6rqaUSE7U/1OBWoMbu4BPlT2a8/4Shew= X-Received: by 2002:a1c:a8f:: with SMTP id 137-v6mr6676449wmk.119.1530923609175; Fri, 06 Jul 2018 17:33:29 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:adf:ec4e:0:0:0:0:0 with HTTP; Fri, 6 Jul 2018 17:33:08 -0700 (PDT) In-Reply-To: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> From: Jim Meyering <jim@HIDDEN> Date: Fri, 6 Jul 2018 17:33:08 -0700 X-Google-Sender-Auth: tlltqOQ-2sHQZaEuW_K-9CvgtBM Message-ID: <CA+8g5KFkFjPKLKLAeu8EiiU+pKsu89VKsvbRzc94_0xGShadZA@HIDDEN> Subject: Re: bug#32073: Improvements in Grep To: Sergiu Hlihor <sh@HIDDEN> Content-Type: multipart/mixed; boundary="000000000000e75f1605705ded47" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 32073 Cc: 32073 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.5 (/) --000000000000e75f1605705ded47 Content-Type: text/plain; charset="UTF-8" On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <sh@HIDDEN> wrote: > Hello, > I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While > grepping over large files I've noticed Grep is painfully slow. The > bottleneck seems to be the read block which is extremely low (looks like > 64KB). For large files residing over big HDD RAID arrays, this request > barely reaches one drive and based on CPU usage, grep is idling more or > less. Given my tests for such scenarios, a read block size of at least > 512KB would be way more efficient. It's very likely that optimum would be > 1MB+. Also, such increase in buffer size would also benefit slightly SSDs > where maximum sequential throughput is usually achieved when reading at > 256KB+ block size. > If this is already possible in newer versions or configurable, I'd > appreciate some hints about the new version which contains or about the way > I can configure it to increase the read block size. Thanks for raising the issue. This makes me think we should follow Coreutils' lead[0] and increase grep's initial buffer size from 32KiB, probably to 128KiB. I will time with the attached diff on a few systems. [0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=v8.22-103-g74ca6e84c --000000000000e75f1605705ded47 Content-Type: application/octet-stream; name="grep-bufsize-increase.diff" Content-Disposition: attachment; filename="grep-bufsize-increase.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_jjaoc07a0 ZGlmZiAtLWdpdCBhL3NyYy9ncmVwLmMgYi9zcmMvZ3JlcC5jCmluZGV4IGY0YWU1ZjUuLjA0YWM5 YzkgMTAwNjQ0Ci0tLSBhL3NyYy9ncmVwLmMKKysrIGIvc3JjL2dyZXAuYwpAQCAtNzk5LDcgKzc5 OSw2IEBAIHNraXBwZWRfZmlsZSAoY2hhciBjb25zdCAqbmFtZSwgYm9vbCBjb21tYW5kX2xpbmUs IGJvb2wgaXNfZGlyKQoKIHN0YXRpYyBjaGFyICpidWZmZXI7CQkvKiBCYXNlIG9mIGJ1ZmZlci4g Ki8KIHN0YXRpYyBzaXplX3QgYnVmYWxsb2M7CQkvKiBBbGxvY2F0ZWQgYnVmZmVyIHNpemUsIGNv dW50aW5nIHNsb3AuICovCi1lbnVtIHsgSU5JVElBTF9CVUZTSVpFID0gMzI3NjggfTsgLyogSW5p dGlhbCBidWZmZXIgc2l6ZSwgbm90IGNvdW50aW5nIHNsb3AuICovCiBzdGF0aWMgaW50IGJ1ZmRl c2M7CQkvKiBGaWxlIGRlc2NyaXB0b3IuICovCiBzdGF0aWMgY2hhciAqYnVmYmVnOwkJLyogQmVn aW5uaW5nIG9mIHVzZXItdmlzaWJsZSBzdHVmZi4gKi8KIHN0YXRpYyBjaGFyICpidWZsaW07CQkv KiBMaW1pdCBvZiB1c2VyLXZpc2libGUgc3R1ZmYuICovCkBAIC04MTIsNiArODExLDkgQEAgc3Rh dGljIGJvb2wgc2tpcF9udWxzOwkJLyogU2tpcCAnXDAnIGluIGRhdGEuICAqLwogc3RhdGljIGJv b2wgc2tpcF9lbXB0eV9saW5lczsJLyogU2tpcCBlbXB0eSBsaW5lcyBpbiBkYXRhLiAgKi8KIHN0 YXRpYyB1aW50bWF4X3QgdG90YWxubDsJLyogVG90YWwgbmV3bGluZSBjb3VudCBiZWZvcmUgbGFz dG5sLiAqLwoKKy8qIEluaXRpYWwgYnVmZmVyIHNpemUsIG5vdCBjb3VudGluZyBzbG9wLiAqLwor ZW51bSB7IElOSVRJQUxfQlVGU0laRSA9IDEyOCAqIDEwMjQgfTsKKwogLyogUmV0dXJuIFZBTCBh bGlnbmVkIHRvIHRoZSBuZXh0IG11bHRpcGxlIG9mIEFMSUdOTUVOVC4gIFZBTCBjYW4gYmUKICAg IGFuIGludGVnZXIgb3IgYSBwb2ludGVyLiAgQm90aCBhcmdzIG11c3QgYmUgZnJlZSBvZiBzaWRl IGVmZmVjdHMuICAqLwogI2RlZmluZSBBTElHTl9UTyh2YWwsIGFsaWdubWVudCkgXAo= --000000000000e75f1605705ded47--
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 6 Jul 2018 22:44:56 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 18:44:55 2018 Received: from localhost ([127.0.0.1]:48900 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1fbZTD-0004NL-Kt for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:55 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52864) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dclarke@HIDDEN>) id 1fbZTB-0004N6-Nb for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <dclarke@HIDDEN>) id 1fbZT5-0002Kd-Ui for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:48 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42426) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <dclarke@HIDDEN>) id 1fbZT5-0002KZ-Qk for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:47 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43835) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <dclarke@HIDDEN>) id 1fbZT4-0003Yf-RL for bug-grep@HIDDEN; Fri, 06 Jul 2018 18:44:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <dclarke@HIDDEN>) id 1fbZT1-0002KB-Pa for bug-grep@HIDDEN; Fri, 06 Jul 2018 18:44:46 -0400 Received: from atl4mhob08.registeredsite.com ([209.17.115.46]:55668) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <dclarke@HIDDEN>) id 1fbZT1-0002Jr-K1 for bug-grep@HIDDEN; Fri, 06 Jul 2018 18:44:43 -0400 Received: from mailpod.hostingplatform.com (atl4qobmail01pod2.registeredsite.com [10.30.77.35]) by atl4mhob08.registeredsite.com (8.14.4/8.14.4) with ESMTP id w66Micxx011705 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for <bug-grep@HIDDEN>; Fri, 6 Jul 2018 18:44:38 -0400 Received: (qmail 26434 invoked by uid 0); 6 Jul 2018 22:44:37 -0000 X-TCPREMOTEIP: 99.253.103.29 X-Authenticated-UID: dclarke@HIDDEN Received: from unknown (HELO sedna.genunix.com) (dclarke@HIDDEN@99.253.103.29) by 0 with ESMTPA; 6 Jul 2018 22:44:37 -0000 Subject: Re: bug#32073: Improvements in Grep To: bug-grep@HIDDEN References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> <9be5ca5d-dc30-508f-649b-5146ee85cf5e@HIDDEN> From: Dennis Clarke <dclarke@HIDDEN> Message-ID: <d2b7c614-4be5-167e-fce0-3e27d9ce5771@HIDDEN> Date: Fri, 6 Jul 2018 18:44:36 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <9be5ca5d-dc30-508f-649b-5146ee85cf5e@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -6.0 (------) On 07/06/2018 06:06 PM, Paul Eggert wrote: > Sergiu Hlihor wrote: >> Given my tests for such scenarios, a read block size of at least >> 512KB would be way more efficient. > > Does stdio do this already? If not, why not? How could grep reasonably > configure a good block size? This seems to be a very specific complaint which is only of value on a very specific system and usage case. There is no way that grep could configure a "good block size" unless it were tailor built. Doesn't seem to be a reasonable RFE. In my opinion. Dennis
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at 32073) by debbugs.gnu.org; 6 Jul 2018 22:06:44 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 18:06:44 2018 Received: from localhost ([127.0.0.1]:48878 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1fbYsF-0003J2-W1 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:06:44 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:33666) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1fbYsD-0003Ij-LC for 32073 <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:06:42 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9660D16161F; Fri, 6 Jul 2018 15:06:35 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id t3uCubr3XabO; Fri, 6 Jul 2018 15:06:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E1759161625; Fri, 6 Jul 2018 15:06:34 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id nlyZyupbRODd; Fri, 6 Jul 2018 15:06:34 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A232616161F; Fri, 6 Jul 2018 15:06:34 -0700 (PDT) Subject: Re: bug#32073: Improvements in Grep To: Sergiu Hlihor <sh@HIDDEN>, 32073 <at> debbugs.gnu.org References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Openpgp: preference=signencrypt Autocrypt: addr=eggert@HIDDEN; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <9be5ca5d-dc30-508f-649b-5146ee85cf5e@HIDDEN> Date: Fri, 6 Jul 2018 15:06:34 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32073 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Sergiu Hlihor wrote: > Given my tests for such scenarios, a read block size of at least > 512KB would be way more efficient. Does stdio do this already? If not, why not? How could grep reasonably configure a good block size?
bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 6 Jul 2018 21:31:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 17:31:49 2018 Received: from localhost ([127.0.0.1]:48863 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1fbYKS-0002MD-DA for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 17:31:49 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49666) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <sh@HIDDEN>) id 1fbTYx-0003J9-SL for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 12:26:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYr-000371-NQ for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 12:26:22 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,HTML_MESSAGE, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:37207) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYr-00036v-K4 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 12:26:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40630) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYq-0001Jy-E2 for bug-grep@HIDDEN; Fri, 06 Jul 2018 12:26:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYp-00035W-EW for bug-grep@HIDDEN; Fri, 06 Jul 2018 12:26:20 -0400 Received: from mail-io0-x234.google.com ([2607:f8b0:4001:c06::234]:36810) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYp-000354-7X for bug-grep@HIDDEN; Fri, 06 Jul 2018 12:26:19 -0400 Received: by mail-io0-x234.google.com with SMTP id k3-v6so11350175iog.3 for <bug-grep@HIDDEN>; Fri, 06 Jul 2018 09:26:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=discovergy-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=gsN4tVk2AbiLxUguUpnC/wUV3Nk6Fj17GCQlGhyz7UM=; b=Oo5AHAu+DxPJESB8LNkT4ZWoCgD+9xIzN56qIih5SmKyJAZBx2ItDZK471rvqSQATG iHZ3GtYgTv7sG9q6cayKkER4huRFSralDMhid3z6Xc5M80wWx5uFgDCje15arJafbEbl oM2QWzvZ7YqHwWsoAIcErxRlVkIRJjM3fYJT1mmiOuZDzVi6tZFEwdMrUL5m+AdQ2GRl gkBsCXi8BAriWnlgM51gaV2nc2vovD8w2UDZZudTGESO182VbqEOj0SAwgsj+FJWmEDP 151g9W4GTtma/7r12patcSiPStmivH9jG7sG3E/VPtYu7MDWsLWCHfXoTMRMj5/xKVo3 KLog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=gsN4tVk2AbiLxUguUpnC/wUV3Nk6Fj17GCQlGhyz7UM=; b=HvtBTPH52sZoi2sQGyzP7ntKmSvEQOjeMbpD4NaSRE7iJ6hhXHQ4etl4/Q1D5zhpDC WZI77+Frwc3Fsv/Ksg8DNBL5aWLE9vHdwK6yGZSu2TYtot/uteKLlbJ+XJRbUCf33TON l96BHaI9RaOjTLcU52Eyh9c8rGNOsdHv2ZKvBVHi0/afUhQ9hqy3qsw91qKB5uvC60IP BPTvFPymBmt8b3EpvtWMjuK912gRR0J77D8n56qXkBdPaRmwI4pnxBMryZevSdOHCdQ2 g3Mlo061b2cTRqWFVHogUbhq3VnJep/ANsz4exsR6nNeSR938JYQ5b+s9jzYJgJJhDxx Ty1g== X-Gm-Message-State: APt69E2yZpnkBt2GtBRmO7+j/mh3LkKSf2fImwUwje97cv1Y4fB08RoW gNAqs6uT3XG+0apEQQWyxnQKjiSWyxLYhkUwIvCdKYtq X-Google-Smtp-Source: AAOMgpfQLYAwjGzKanOd0Y03aDYUSvuifqGJP849QmjL8Bxcg6XC1qo1G9LHokuzVfIFD9E2XQcaPh47omloYEXunRk= X-Received: by 2002:a6b:4e04:: with SMTP id c4-v6mr9029232iob.19.1530894377892; Fri, 06 Jul 2018 09:26:17 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:1b98:0:0:0:0:0 with HTTP; Fri, 6 Jul 2018 09:26:17 -0700 (PDT) From: Sergiu Hlihor <sh@HIDDEN> Date: Fri, 6 Jul 2018 18:26:17 +0200 Message-ID: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN> Subject: Improvements in Grep To: bug-grep@HIDDEN Content-Type: multipart/alternative; boundary="000000000000954fdf0570571f6a" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 06 Jul 2018 17:31:47 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) --000000000000954fdf0570571f6a Content-Type: text/plain; charset="UTF-8" Hello, I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While grepping over large files I've noticed Grep is painfully slow. The bottleneck seems to be the read block which is extremely low (looks like 64KB). For large files residing over big HDD RAID arrays, this request barely reaches one drive and based on CPU usage, grep is idling more or less. Given my tests for such scenarios, a read block size of at least 512KB would be way more efficient. It's very likely that optimum would be 1MB+. Also, such increase in buffer size would also benefit slightly SSDs where maximum sequential throughput is usually achieved when reading at 256KB+ block size. If this is already possible in newer versions or configurable, I'd appreciate some hints about the new version which contains or about the way I can configure it to increase the read block size. Thanks and best regards, Sergiu --000000000000954fdf0570571f6a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Hello, <br></div><div>=C2=A0=C2=A0=C2=A0=C2=A0 I'= m using grep over Ubuntu Server 14.04 (Grep version 2.16). While grepping o= ver large files I've noticed Grep is painfully slow. The bottleneck see= ms to be the read block which is extremely low (looks like 64KB). For large= files residing over big HDD RAID arrays, this request barely reaches one d= rive and based on CPU usage, grep is idling more or less. Given my tests fo= r such scenarios, a read block size of at least 512KB would be way more eff= icient. It's very likely that optimum would be 1MB+. Also, such increas= e in buffer size would also benefit slightly SSDs where maximum sequential = throughput is usually achieved when reading at 256KB+ block size. <br></div= ><div>=C2=A0=C2=A0=C2=A0=C2=A0 If this is already possible in newer version= s or configurable, I'd appreciate some hints about the new version whic= h contains or about the way I can configure it to increase the read block s= ize. <br></div><div><br></div><div>Thanks and best regards,</div><div>Sergi= u</div></div> --000000000000954fdf0570571f6a--
Sergiu Hlihor <sh@HIDDEN>
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#32073
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.