GNU bug report logs - #32073
Improvements in Grep

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: grep; Severity: wishlist; Reported by: Sergiu Hlihor <sh@HIDDEN>; dated Fri, 6 Jul 2018 21:32:02 UTC; Maintainer for grep is bug-grep@HIDDEN.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 2 Jan 2020 01:04:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 20:04:17 2020
Received: from localhost ([127.0.0.1]:37835 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imouO-0004E8-SV
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 20:04:17 -0500
Received: from mail-io1-f47.google.com ([209.85.166.47]:44138)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sh@HIDDEN>) id 1imouM-0004Dv-Q6
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 20:04:15 -0500
Received: by mail-io1-f47.google.com with SMTP id b10so36954283iof.11
 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 17:04:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=discovergy-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=uG0zcpsFJMb40VAoaqgG4yBxN9fQ8HWatjcq1WBdwuI=;
 b=MrT5OWrM9nJE49cTUjxs8k/CxT7nbY4ZeVQEGSTjEnMFfbQgATGf6icSTcK75Z88No
 nNl+qTwFLLBjZattlCjmMwjNt8ZavrfHuQJQJUOMBpTmDoB6y+kw/Hp3G5lBJ5zuSawo
 EgkmrtKl6uGtcn+GLpXN0/U+qbL7M2RfFYL30m0JYOBRix5Yt95amdM6LpKCvddxzao8
 nXZRyxNjdFAEBlTNx2e9ItM8eCid8K/Yu+gbtEl6aMmyh5FuwU7GaMLAjGGObUIGWqkc
 jiMxWWi+Zp/GIXeZKmkeOuZwGz8xt9iuBOC6w/J19PbEJagxok0z8tZD2+9n/HZuWW9E
 HHJw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=uG0zcpsFJMb40VAoaqgG4yBxN9fQ8HWatjcq1WBdwuI=;
 b=nUyERW3t+ZcnzUWltGBTcmkQR4kKsjbsyF320UpOEb5933zi5sJoVEw7z0JDP1rDfV
 FTeC7XyNHNGI7zX8rQnDkOhKs+tPCFRX4SomGFhkhIFuuEJtT4/IQpPGFpRIsuicQifn
 +hPRNqytX/ulsOZJL5Le0w8fTXV03dHuosziGZqMBPDJsG824Czh51KM0ijQf+VaEYXY
 3QsP9zH3EufrihVbr0jprdN/b43SMG7JsgGJUa1NL1pDcGpUJ1z0KAlrEiFptwzDtaTQ
 JhYwBaCQcWUuTW97ch2C4GPYKSXlMCHKoPoYuufa8T2zMwi/+UMknLMiip3u+qcHP9sT
 QQLA==
X-Gm-Message-State: APjAAAUBj/1bVqH9LaAq+VlcVHbDLEec28q59Knq8uW/Ze9lRNRmT/M+
 p+Zbru5g2e8Y9UmFhGS3x4ih5Z7nQhhWzjs0qBtrMQ==
X-Google-Smtp-Source: APXvYqxTMks5Ajdq0T7iEzgJjZoWJOPNuJpms5hMMgKkp0KY1CrYSQcHExV4liYzv2ybrV18DBFtVhqQwDumVAvR104=
X-Received: by 2002:a5e:8505:: with SMTP id i5mr50080878ioj.158.1577927049287; 
 Wed, 01 Jan 2020 17:04:09 -0800 (PST)
MIME-Version: 1.0
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
 <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
 <CA+8g5KEEqcTjV3k+50y4SNhUrrhwO4ACtUuM5PDeRHaaBRAKBg@HIDDEN>
In-Reply-To: <CA+8g5KEEqcTjV3k+50y4SNhUrrhwO4ACtUuM5PDeRHaaBRAKBg@HIDDEN>
From: Sergiu Hlihor <sh@HIDDEN>
Date: Thu, 2 Jan 2020 02:03:58 +0100
Message-ID: <CAD-3cddJmwBTqozvJcJerc8tRXcv0-2Pf0aePe2yhkJaSOY+vA@HIDDEN>
Subject: Re: Improvements in Grep (Bug#32073)
To: Jim Meyering <jim@HIDDEN>
Content-Type: multipart/alternative; boundary="000000000000412dba059b1dc5f9"
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>,
 Dennis Clarke <dclarke@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--000000000000412dba059b1dc5f9
Content-Type: text/plain; charset="UTF-8"

Hi Jim,
The system for which this hurts me the most is an Ubuntu 14.04 where I'd
need to run it as a separate binary. As I'm not familiar with the way it's
built, is there any guidelines of how to build it from sources? I'd happy
build it with ever larger block sizes and test.

On Thu, 2 Jan 2020 at 01:51, Jim Meyering <jim@HIDDEN> wrote:

> On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <sh@HIDDEN> wrote:
> > Paul, I have to correct you. On a production server you have usually a
> mix of applications many times including databases. For databases, having a
> read ahead means one IO less since usually database access patterns are
> random reads. Here actually best is to disable completely read ahead. In
> fact, I do have to say that probably best is to disable completely read
> ahead and let applications deal with it, either in an automatic fashion,
> like reading the optimal IO block size from device  or in a configurable
> way with defaults good enough for today's servers. If you now configure the
> OS to do a read ahead hitting all HDDs then you induce potentially
> unnecessary IO load for all applications which use it, which when having
> HDDs is totally unacceptable. That's why the best is to be application
> specific and ideally configured to use optimal IO block size.
> >
> > So no, letting OS to do it is stupid.
> >
> > On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@HIDDEN> wrote:
> >>
> >> On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
> >> > If you rely on OS, then
> >> > you are at the mercy of whatever read ahead configuration you have.
> >>
> >> Right, and whatever changes you make to the OS and its read-ahead
> configuration
> >> will work for all applications, not just for 'grep'. So, change the OS
> to do
> >> that. There shouldn't be a need to change 'grep' in particular (or 'cp'
> in
> >> particular, or 'awk' in particular, etc.).
> >>
> >> > The issue of large
> >> > block sizes for IO operations is widespread across all tools from
> Linux,
> >> > like rsync or cp and its only getting worse
> >>
> >> Quite right. And it would be painful to have to modify all those tools,
> and to
> >> maintain those modifications. So modify the OS instead. Scheduling
> read-ahead is
> >> really the OS's job anyway.
>
> Hi Sergiu,
>
> If you would like to help make grep use larger buffer sizes, please
> run and report benchmarks measuring how much of a difference it would
> make, at least for your hardware. Here are some of the tests I ran to
> justify raising it from ~32k to ~96k:
> https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html
>

--000000000000412dba059b1dc5f9
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div>Hi Jim,</div><div>The system for whi=
ch this hurts me the most is an Ubuntu 14.04 where I&#39;d need to run it a=
s a separate binary. As I&#39;m not familiar with the way it&#39;s built, i=
s there any guidelines of how to build it from sources? I&#39;d happy build=
 it with ever larger block sizes and test.</div></div><br><div class=3D"gma=
il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, 2 Jan 2020 at 01:51=
, Jim Meyering &lt;<a href=3D"mailto:jim@HIDDEN">jim@HIDDEN</a>=
&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On W=
ed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor &lt;<a href=3D"mailto:sh@discover=
gy.com" target=3D"_blank">sh@HIDDEN</a>&gt; wrote:<br>
&gt; Paul, I have to correct you. On a production server you have usually a=
 mix of applications many times including databases. For databases, having =
a read ahead means one IO less since usually database access patterns are r=
andom reads. Here actually best is to disable completely read ahead. In fac=
t, I do have to say that probably best is to disable completely read ahead =
and let applications deal with it, either in an automatic fashion, like rea=
ding the optimal IO block size from device=C2=A0 or in a configurable way w=
ith defaults good enough for today&#39;s servers. If you now configure the =
OS to do a read ahead hitting all HDDs then you induce potentially unnecess=
ary IO load for all applications which use it, which when having HDDs is to=
tally unacceptable. That&#39;s why the best is to be application specific a=
nd ideally configured to use optimal IO block size.<br>
&gt;<br>
&gt; So no, letting OS to do it is stupid.<br>
&gt;<br>
&gt; On Wed, 1 Jan 2020 at 20:42, Paul Eggert &lt;<a href=3D"mailto:eggert@=
cs.ucla.edu" target=3D"_blank">eggert@HIDDEN</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On 1/1/20 1:15 AM, Sergiu Hlihor wrote:<br>
&gt;&gt; &gt; If you rely on OS, then<br>
&gt;&gt; &gt; you are at the mercy of whatever read ahead configuration you=
 have.<br>
&gt;&gt;<br>
&gt;&gt; Right, and whatever changes you make to the OS and its read-ahead =
configuration<br>
&gt;&gt; will work for all applications, not just for &#39;grep&#39;. So, c=
hange the OS to do<br>
&gt;&gt; that. There shouldn&#39;t be a need to change &#39;grep&#39; in pa=
rticular (or &#39;cp&#39; in<br>
&gt;&gt; particular, or &#39;awk&#39; in particular, etc.).<br>
&gt;&gt;<br>
&gt;&gt; &gt; The issue of large<br>
&gt;&gt; &gt; block sizes for IO operations is widespread across all tools =
from Linux,<br>
&gt;&gt; &gt; like rsync or cp and its only getting worse<br>
&gt;&gt;<br>
&gt;&gt; Quite right. And it would be painful to have to modify all those t=
ools, and to<br>
&gt;&gt; maintain those modifications. So modify the OS instead. Scheduling=
 read-ahead is<br>
&gt;&gt; really the OS&#39;s job anyway.<br>
<br>
Hi Sergiu,<br>
<br>
If you would like to help make grep use larger buffer sizes, please<br>
run and report benchmarks measuring how much of a difference it would<br>
make, at least for your hardware. Here are some of the tests I ran to<br>
justify raising it from ~32k to ~96k:<br>
<a href=3D"https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.h=
tml" rel=3D"noreferrer" target=3D"_blank">https://lists.gnu.org/archive/htm=
l/grep-devel/2018-10/msg00002.html</a><br>
</blockquote></div></div>

--000000000000412dba059b1dc5f9--




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 2 Jan 2020 00:51:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 19:51:20 2020
Received: from localhost ([127.0.0.1]:37827 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imohs-0003uy-Ks
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 19:51:20 -0500
Received: from mail-wr1-f67.google.com ([209.85.221.67]:38805)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1imohq-0003ul-0s
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 19:51:18 -0500
Received: by mail-wr1-f67.google.com with SMTP id y17so37907645wrh.5
 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 16:51:17 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=+G2u1+TYKprGWirYES7YqULTKruNQnNn0zOaJRjBkZQ=;
 b=DJpehv2iK45sVa2pCWwsjuhscGIgi7Vi+JiDPqoKcUIob746N1bEwKed6Zz4uBTo9J
 79eVt33udjV2xpDpaBAbUI0+JClV5SM+w5iEsbV0baXoAD+PkggvHJlJyD4hIVd2kP4O
 O0dkvRo5s161Ji2xmGe4jjxgLfiZs1Tlbt1ZM4yEdEJ/XvBYVJa1fMgNdtC4bHDmtth8
 EcfBurLtE+kUPbjWpdJJ223Xz9gRhcVjLod4RgxiZCFORQDSHSQmGkwjHQGytLv2NjD+
 xotEpLdrbGq5KeOyV4w0qtm/f0wbzXmHQE3rgggERy8/QH+o63Pu6RGSU76ILOYZ8Enk
 GflA==
X-Gm-Message-State: APjAAAUvGuD5mRkDz7QdxDJYD/M5KrwuoEZQvDe9YthH/7iwGE96hNvJ
 zm+pZHTKXMSPjejYb6YEoyW91XAaUVX51yl+rUE=
X-Google-Smtp-Source: APXvYqx0LgCp9iag1ABv/x0dAx+wgdhnP1u3ZR3gJKeHrLjJoDlAoey26zheesXxnwVnWVakWy8OeBIjdeOkucRiy2M=
X-Received: by 2002:a5d:670a:: with SMTP id o10mr82667154wru.227.1577926272259; 
 Wed, 01 Jan 2020 16:51:12 -0800 (PST)
MIME-Version: 1.0
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
 <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
In-Reply-To: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Wed, 1 Jan 2020 16:51:00 -0800
Message-ID: <CA+8g5KEEqcTjV3k+50y4SNhUrrhwO4ACtUuM5PDeRHaaBRAKBg@HIDDEN>
Subject: Re: Improvements in Grep (Bug#32073)
To: Sergiu Hlihor <sh@HIDDEN>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.5 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>,
 Dennis Clarke <dclarke@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.5 (/)

On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <sh@HIDDEN> wrote:
> Paul, I have to correct you. On a production server you have usually a mi=
x of applications many times including databases. For databases, having a r=
ead ahead means one IO less since usually database access patterns are rand=
om reads. Here actually best is to disable completely read ahead. In fact, =
I do have to say that probably best is to disable completely read ahead and=
 let applications deal with it, either in an automatic fashion, like readin=
g the optimal IO block size from device  or in a configurable way with defa=
ults good enough for today's servers. If you now configure the OS to do a r=
ead ahead hitting all HDDs then you induce potentially unnecessary IO load =
for all applications which use it, which when having HDDs is totally unacce=
ptable. That's why the best is to be application specific and ideally confi=
gured to use optimal IO block size.
>
> So no, letting OS to do it is stupid.
>
> On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@HIDDEN> wrote:
>>
>> On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
>> > If you rely on OS, then
>> > you are at the mercy of whatever read ahead configuration you have.
>>
>> Right, and whatever changes you make to the OS and its read-ahead config=
uration
>> will work for all applications, not just for 'grep'. So, change the OS t=
o do
>> that. There shouldn't be a need to change 'grep' in particular (or 'cp' =
in
>> particular, or 'awk' in particular, etc.).
>>
>> > The issue of large
>> > block sizes for IO operations is widespread across all tools from Linu=
x,
>> > like rsync or cp and its only getting worse
>>
>> Quite right. And it would be painful to have to modify all those tools, =
and to
>> maintain those modifications. So modify the OS instead. Scheduling read-=
ahead is
>> really the OS's job anyway.

Hi Sergiu,

If you would like to help make grep use larger buffer sizes, please
run and report benchmarks measuring how much of a difference it would
make, at least for your hardware. Here are some of the tests I ran to
justify raising it from ~32k to ~96k:
https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 1 Jan 2020 21:46:15 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 16:46:15 2020
Received: from localhost ([127.0.0.1]:37671 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imlol-0001m9-FE
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:46:15 -0500
Received: from lists.gnu.org ([209.51.188.17]:36204)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pj@HIDDEN>) id 1imlok-0001m2-Fw
 for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:46:14 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:41740)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <pj@HIDDEN>) id 1imloi-0006aO-S5
 for bug-grep@HIDDEN; Wed, 01 Jan 2020 16:46:14 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_LOW,
 URIBL_BLOCKED autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <pj@HIDDEN>) id 1imloh-0007Jc-Mw
 for bug-grep@HIDDEN; Wed, 01 Jan 2020 16:46:12 -0500
Received: from out3-smtp.messagingengine.com ([66.111.4.27]:42797)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <pj@HIDDEN>) id 1imloh-0007J0-EU
 for bug-grep@HIDDEN; Wed, 01 Jan 2020 16:46:11 -0500
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.nyi.internal (Postfix) with ESMTP id BB67E2234B
 for <bug-grep@HIDDEN>; Wed,  1 Jan 2020 16:46:10 -0500 (EST)
Received: from imap34 ([10.202.2.84])
 by compute1.internal (MEProxy); Wed, 01 Jan 2020 16:46:10 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=content-type:date:from:in-reply-to
 :message-id:mime-version:references:subject:to:x-me-proxy
 :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=1lZVA+
 i/aNbISUaTQxnlsayXO9m5ai4v70uzaoJnjf8=; b=h4D19IOsSFh+M6g73+sQnr
 QJG90tT+P2IiguwhZhb1Ft+nsk5aE/8bGTNpL3vOcKJspn2deBc/jEbiLX9Gp2qe
 DOzYXhVUH6OGVvHnIGulN9GUguvgqNfbt9UC5vqdkr6jLuXK9RyT6pyTrD38acU6
 RmmdYhMOVi6F89BVZApfBhtsbiePo3ERZfNauGOEeGqpE5FQ6B7Rg6J42akfU7/J
 w3Fh5UZ2zPeBILfSh56hlaY69HAGwaI0GFb8iwZIrXhs6eTLJg1lyipZwV1jCn3i
 Y9KKzGRr89E2NV6ZnEELGqkL8mOJr0iUFhtq1e3AiDeHdd/SEiaFHOkJrvmyzuOQ
 ==
X-ME-Sender: <xms:IhMNXpWnRJ8_D0d6Q75r6NYCiOBc9wRyE33jvNlp5eppKTmzDWcT6A>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvdefledgudehvdcutefuodetggdotefrod
 ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh
 necuuegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvffutgesth
 dtredtreertdenucfhrhhomhepfdfrrghulhculfgrtghkshhonhdfuceophhjsehushgr
 rdhnvghtqeenucfrrghrrghmpehmrghilhhfrhhomhepphhjsehushgrrdhnvghtnecuve
 hluhhsthgvrhfuihiivgeptd
X-ME-Proxy: <xmx:IhMNXhdD0kzmSybXt2dzC1gQokhSxjkTsH3J3riofFD4b23xIQrFBg>
 <xmx:IhMNXkyatBxRUL1f-XHhfNs5ux4dRQa1ZXLldBSWuZ-OEzSQZEBPdQ>
 <xmx:IhMNXkGC57X5lY8sT2gC5JPxg_gYNc_apfsbQvWFBF2Xunw9FkhrtQ>
 <xmx:IhMNXijfb0OrKNYVO-2PbTaHwlcr35ywIVbCFoifMjmbUVg0Xcisaw>
Received: by mailuser.nyi.internal (Postfix, from userid 501)
 id 3B42A1460061; Wed,  1 Jan 2020 16:46:10 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.1.7-694-gd5bab98-fmstable-20191218v1
Mime-Version: 1.0
Message-Id: <a0744545-50e1-4e11-b200-2fac405c7260@HIDDEN>
In-Reply-To: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
 <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
Date: Wed, 01 Jan 2020 15:45:54 -0600
From: "Paul Jackson" <pj@HIDDEN>
To: bug-grep@HIDDEN
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
 [fuzzy]
X-Received-From: 66.111.4.27
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

From my old Unix fart view point, Paul (the other Paul)
is herding a hundred GNU cats, small command line utilities,
many of which date their origins back to the 1970's, many of
which have over the years grown their own internal i/o routines
with specific performance specializations, but few of which
have much in the way of user customizable i/o blocking and
read-ahead customizations.

Except for the last decade, those commands spent almost
their entire lives running off spinning rust platters, which
grew (immensely) in size over the years, but which did not
change much in other performance characteristics. 

Those commands are in general not well suited to adapting to
provide maximally optimal performance across the recent
generation of storage devices, with their much more varied
performance characteristics.

I'm guessing that Sergiu has some specific needs that it seems
that grep meets, except that grep (like its hundred cat siblings)
lacks the tunable i/o characteristics needed to get maximum
performance across a rapidly evolving variety of these more
recent kinds of storage.

What I've done in situations such as I suspect Sergiu finds
himself in is to code up a custom utility, that met my specific
needs, when I had higher performance demands, while
continuing to make extensive use of the general purpose
classic Unix/Linux command line utilities that Paul E. now
herds.

I can't imagine that it would make sense to attempt to recode
a hundred classic GNU utilities to each be intelligently adaptable
goats/pigs/cats/dogs/cows/bison/... depending on the i/o
terrain they were running on.

Many many thanks to Paul E. for herding these cats all these
many years.  I hope my weird comments to not cause him even
the slightest distress.

(The word "cat" above refers to four legged felines, not to the
concatenate command line utility.)

-- 
                Paul Jackson
                pj@HIDDEN




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 21:02:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 16:02:49 2020
Received: from localhost ([127.0.0.1]:37654 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iml8j-0000lx-3d
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:02:49 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48738)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1iml8g-0000lf-Gi
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 16:02:47 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1D85D160052;
 Wed,  1 Jan 2020 13:02:39 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id td9dIdw_GCN9; Wed,  1 Jan 2020 13:02:38 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7A6AA160054;
 Wed,  1 Jan 2020 13:02:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id N3e1b83al3QG; Wed,  1 Jan 2020 13:02:38 -0800 (PST)
Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com
 [23.242.74.103])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 51D6C160052;
 Wed,  1 Jan 2020 13:02:38 -0800 (PST)
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
To: Sergiu Hlihor <sh@HIDDEN>
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
 <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <0c596c01-3a43-2651-7de8-50d92ae195a4@HIDDEN>
Date: Wed, 1 Jan 2020 13:02:38 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.2.2
MIME-Version: 1.0
In-Reply-To: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 1/1/20 12:04 PM, Sergiu Hlihor wrote:

> That's why the best is to be application specific

That doesn't mean that one should have to modify every application. One could
instead modify the OS so that it uses different read-ahead heuristics for
different classes of applications. This should be easier to manage than
modifying every individual application.




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 20:24:36 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 15:24:36 2020
Received: from localhost ([127.0.0.1]:37619 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imkXj-0008Ir-VL
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:24:36 -0500
Received: from freefriends.org ([96.88.95.60]:49340)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1imkXi-0008Ik-D9
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:24:34 -0500
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (freefriends.org [96.88.95.60])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 001KOQ9E012802
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Wed, 1 Jan 2020 13:24:27 -0700
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 001KOQMn012801;
 Wed, 1 Jan 2020 13:24:26 -0700
From: arnold@HIDDEN
Message-Id: <202001012024.001KOQMn012801@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Wed, 01 Jan 2020 13:24:26 -0700
To: sh@HIDDEN, arnold@HIDDEN
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <202001011119.001BJMYA027994@HIDDEN>
 <CAD-3cdeVbf3TVwFyj7NFd5d5_gTXugTb8_=x9aTjGE4+ufHggQ@HIDDEN>
In-Reply-To: <CAD-3cdeVbf3TVwFyj7NFd5d5_gTXugTb8_=x9aTjGE4+ufHggQ@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, eggert@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.9 (/)

Hi.

Sergiu Hlihor <sh@HIDDEN> wrote:

> Arnold, there is no need to write user code, it is already done in
> benchmarks. One of the standard benchmarks when testing HDDs and SSDs is
> read throughput vs block size and at different queue depths.

I think you're misunderstanding me, or I am misunderstanding you.

As the gawk maintainer, I can choose the buffer size to use every time
I issue a read(2) system call for any given input file.  Gawk currently
uses the smaller of (a) the file's size or (b) the st_blksize member of
the struct stat array.

If I understand you correctly, this is "not enough"; gawk (grep,
cp, etc.) should all use an optimal buffer size that depends upon the
underlying storage hardware where the file is located.

So far, so good, except for: How do I determine what that number is?
I cannot run a benchmark before opening each and every file. I don't
know of a system call that will give me that number. (If there is,
please point me to it.)

Do you just want a command line option or environment variable
that you, as the application user, can set?

If the latter, it happens that gawk will let you set AWKBUFSIZE and
it will use whatever number you supply for doing reads. (This is
even documented.)

HTH,

Arnold




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 20:04:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 15:04:59 2020
Received: from localhost ([127.0.0.1]:37607 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imkEk-0007qg-W0
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:04:59 -0500
Received: from mail-il1-f174.google.com ([209.85.166.174]:38191)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sh@HIDDEN>) id 1imkEi-0007qS-HO
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 15:04:57 -0500
Received: by mail-il1-f174.google.com with SMTP id f5so32700534ilq.5
 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 12:04:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=discovergy-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=f3gPqw//sPzxPArZaLCn5qCkkS0muRBetlNjDRv9cFw=;
 b=PtSs/uSm5aVIUCaZXg5w2ZCQsGUa5lQcpOH77ANuNNf+2piUl9tePpfnUfa+N231b4
 LN4/iPcDPDuxS0SIErtA/9cOBH/lAoggtTqhmsze0Itxtal1Q9rl/k8kp8VqGzZQpQob
 Ug/YVEttA1WULSbvtaLmx1SjBtb/oyt+GX5JZGxYNo9Ww3dc7YmUWz2t358Kk8eHku4n
 AAuP6kIkhOBQGZrqMzVe6dGCeElWKUgInkinYqpWWinD5gPCuIskIA2m6WHb/ZzWtYJO
 Bg3i9doIJ05U5BZhHYJqmkAV0+RhRClx2oYc0GcSnvtQFY0w8BnZ0HwT6ojKsICI+GOj
 Npwg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=f3gPqw//sPzxPArZaLCn5qCkkS0muRBetlNjDRv9cFw=;
 b=TTMmEUMp/2qYER9W7/OH0NJfNkVCjbQ93a4SZ7lKU/VMOUdrH4ntOlQ9Amyk0MU/v1
 3RsWIXbLs3d5Bvod84nNtN3oRc6770kVemblTN0zGh591o2vySDfEU7lFqo/SN++ugiw
 BFq+RXTDSXQdUrhRnmBlhWSeWncdp2Zwyye0U5zGxiT7oc7gzH9rck9fxd7lIUXd5zV5
 qYdiLaSYKrJKhjB0ursaf6rybkB+EzQntUFGQodz0ImJBmSAGPvVKXbP4gEKmsEpz3ZQ
 KWsWsQ+MbGam9m5Lz5hvB3M1Nk7epL+P+v4dWFHy7xqAqF3gcW/QJnnkNCea6nvBcBot
 uwzg==
X-Gm-Message-State: APjAAAUfpXpEeR5HZA5G+rM0bsnSPtOv+/39pEfYmiBjqtAWV0/L+AwN
 QTe0uc5cRMdRzLiUpQyGFaZaG5fYHUKEpK6C02kW/Q==
X-Google-Smtp-Source: APXvYqyXJkilmFlV4mF5TVSrUwHwx6bE4fHBFq6X4gjgNZsVhsoYnsx/3nyr/iAeOnF8GKBQd+dW7wxgbdEBWGsQysk=
X-Received: by 2002:a92:ce09:: with SMTP id b9mr64895585ilo.219.1577909091082; 
 Wed, 01 Jan 2020 12:04:51 -0800 (PST)
MIME-Version: 1.0
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
In-Reply-To: <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
From: Sergiu Hlihor <sh@HIDDEN>
Date: Wed, 1 Jan 2020 21:04:39 +0100
Message-ID: <CAD-3cdeARpf+yBqSf0uF00Y3z6xrRksjz-5CarqrgPiEXnH_Mw@HIDDEN>
Subject: Re: Improvements in Grep (Bug#32073)
To: Paul Eggert <eggert@HIDDEN>
Content-Type: multipart/alternative; boundary="000000000000dcbab1059b199639"
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>,
 Jim Meyering <jim@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--000000000000dcbab1059b199639
Content-Type: text/plain; charset="UTF-8"

Paul, I have to correct you. On a production server you have usually a mix
of applications many times including databases. For databases, having a
read ahead means one IO less since usually database access patterns are
random reads. Here actually best is to disable completely read ahead. In
fact, I do have to say that probably best is to disable completely read
ahead and let applications deal with it, either in an automatic fashion,
like reading the optimal IO block size from device  or in a configurable
way with defaults good enough for today's servers. If you now configure the
OS to do a read ahead hitting all HDDs then you induce potentially
unnecessary IO load for all applications which use it, which when having
HDDs is totally unacceptable. That's why the best is to be application
specific and ideally configured to use optimal IO block size.

So no, letting OS to do it is stupid.

On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@HIDDEN> wrote:

> On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
> > If you rely on OS, then
> > you are at the mercy of whatever read ahead configuration you have.
>
> Right, and whatever changes you make to the OS and its read-ahead
> configuration
> will work for all applications, not just for 'grep'. So, change the OS to
> do
> that. There shouldn't be a need to change 'grep' in particular (or 'cp' in
> particular, or 'awk' in particular, etc.).
>
> > The issue of large
> > block sizes for IO operations is widespread across all tools from Linux,
> > like rsync or cp and its only getting worse
>
> Quite right. And it would be painful to have to modify all those tools,
> and to
> maintain those modifications. So modify the OS instead. Scheduling
> read-ahead is
> really the OS's job anyway.
>

--000000000000dcbab1059b199639
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Paul, I have to correct you. On a production server y=
ou have usually a mix of applications many times including databases. For d=
atabases, having a read ahead means one IO less since usually database acce=
ss patterns are random reads. Here actually best is to disable completely r=
ead ahead. In fact, I do have to say that probably best is to disable compl=
etely read ahead and let applications deal with it, either in an automatic =
fashion, like reading the optimal IO block size from device=C2=A0 or in a c=
onfigurable way with defaults good enough for today&#39;s servers. If you n=
ow configure the OS to do a read ahead hitting all HDDs then you induce pot=
entially unnecessary IO load for all applications which use it, which when =
having HDDs is totally unacceptable. That&#39;s why the best is to be appli=
cation specific and ideally configured to use optimal IO block size.</div><=
div><br></div><div>So no, letting OS to do it is stupid.<br></div><br><div =
class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, 1 Jan 2=
020 at 20:42, Paul Eggert &lt;<a href=3D"mailto:eggert@HIDDEN" target=
=3D"_blank">eggert@HIDDEN</a>&gt; wrote:<br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2=
04,204,204);padding-left:1ex">On 1/1/20 1:15 AM, Sergiu Hlihor wrote:<br>
&gt; If you rely on OS, then<br>
&gt; you are at the mercy of whatever read ahead configuration you have.<br=
>
<br>
Right, and whatever changes you make to the OS and its read-ahead configura=
tion<br>
will work for all applications, not just for &#39;grep&#39;. So, change the=
 OS to do<br>
that. There shouldn&#39;t be a need to change &#39;grep&#39; in particular =
(or &#39;cp&#39; in<br>
particular, or &#39;awk&#39; in particular, etc.).<br>
<br>
&gt; The issue of large<br>
&gt; block sizes for IO operations is widespread across all tools from Linu=
x,<br>
&gt; like rsync or cp and its only getting worse<br>
<br>
Quite right. And it would be painful to have to modify all those tools, and=
 to<br>
maintain those modifications. So modify the OS instead. Scheduling read-ahe=
ad is<br>
really the OS&#39;s job anyway.<br>
</blockquote></div><br></div>

--000000000000dcbab1059b199639--




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 19:43:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 14:43:04 2020
Received: from localhost ([127.0.0.1]:37595 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imjtY-0007LK-0Q
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:43:04 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:43186)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1imjtV-0007Kk-3q
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:43:01 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id BF9D2160052;
 Wed,  1 Jan 2020 11:42:54 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id xp1ZcUe4sLgB; Wed,  1 Jan 2020 11:42:54 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1B474160054;
 Wed,  1 Jan 2020 11:42:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id K68Jkv66INS6; Wed,  1 Jan 2020 11:42:54 -0800 (PST)
Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com
 [23.242.74.103])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E1E10160052;
 Wed,  1 Jan 2020 11:42:53 -0800 (PST)
Subject: Re: Improvements in Grep (Bug#32073)
To: Sergiu Hlihor <sh@HIDDEN>
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <299d76d3-09d2-8c4d-3b1f-0b2205c03db7@HIDDEN>
Date: Wed, 1 Jan 2020 11:42:53 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.2.2
MIME-Version: 1.0
In-Reply-To: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>,
 Jim Meyering <jim@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
> If you rely on OS, then
> you are at the mercy of whatever read ahead configuration you have.

Right, and whatever changes you make to the OS and its read-ahead configuration
will work for all applications, not just for 'grep'. So, change the OS to do
that. There shouldn't be a need to change 'grep' in particular (or 'cp' in
particular, or 'awk' in particular, etc.).

> The issue of large
> block sizes for IO operations is widespread across all tools from Linux,
> like rsync or cp and its only getting worse

Quite right. And it would be painful to have to modify all those tools, and to
maintain those modifications. So modify the OS instead. Scheduling read-ahead is
really the OS's job anyway.




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 19:07:11 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 14:07:11 2020
Received: from localhost ([127.0.0.1]:37583 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imjKo-0006V4-Aa
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:07:11 -0500
Received: from mail-il1-f169.google.com ([209.85.166.169]:47082)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sh@HIDDEN>) id 1imjKm-0006Us-9C
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 14:07:09 -0500
Received: by mail-il1-f169.google.com with SMTP id t17so32599947ilm.13
 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 11:07:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=discovergy-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=GE5BFtR8fRKW8SyVt2Lsf4LSkbRzd93/7WiEMgl11gY=;
 b=wLcj4eRokODQL0JaGa9c6cAKCu7JfsifRZmzw4C7SXX44Gq6qvCLR3b4mAPh/l+UMS
 VJtMKBnP5BTRlNwsYtNlGgi0CSXPRFTsIAkfQ8lrqBEEW1IfX7uEBCmL3CF28vSbeB//
 gMcALYDiBGg853Ma2cuTs5epE4zWXpYU+giu6yabLP2U63D37ERXXON9PRheQS7ZyXKZ
 6nkO1Ke1MiyBHx3cx0unMYYEeesQLZOIQQJjXN9XP5ZDvrpTwC+NrPBpHKGIRWeA8YBn
 CgjZajjSxDV3mC4V4zrr8EMB4tYv4y5VebZ6EISUtPKNZ77r2Sx10pP5/x1fBHnWE7rh
 jt4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=GE5BFtR8fRKW8SyVt2Lsf4LSkbRzd93/7WiEMgl11gY=;
 b=Io4LA7W5lP/3oOKN9v6oTszdXzyWwiM5t8cwr8UNetOImddPwsWACTUZrDxz3fa5GU
 EvVQab9f6IGIRlGqhJ5QgSIDY3iwqZUhDnIWaXL24kLGrdj1LDnwD4kX9sWrB7zrKf5x
 q7iypdVIKlpVpcpgPCDqHGodSsecsmwq6lZyMGLeTojrFImwqK81vFr8MXND06UDWmQJ
 pjHMBEeX9tqpOHVNX+gh4CyXErHgdsWHmQLrlFMcvDoVZpAGSgzKbCGaVrlomgO3crNy
 EWkY9N18muh4DfbXmS+g3jqh77DvrRB9kSIWnqUkwMjw3r34Z5k2XV8H6QHU/QbH5MLz
 5KxA==
X-Gm-Message-State: APjAAAW/eUaiM5HWABC1RF84tL87+fcjejLYjs9oxjD1Fqozy87K0EPi
 w644Ffmtoe5cEW6dGBPBPeLkhsfcsiR5P3pbCfudHmTb2MQ=
X-Google-Smtp-Source: APXvYqz3EOZRxGhoBIXUl1bc21QJ1eX+eLgDIJhaZSMVcJq8KpODZtuA8VFHuuYcMl86wEhSgL1v7c3L5rMyGkbvsAc=
X-Received: by 2002:a92:2804:: with SMTP id l4mr66440415ilf.136.1577905622626; 
 Wed, 01 Jan 2020 11:07:02 -0800 (PST)
MIME-Version: 1.0
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
 <202001011119.001BJMYA027994@HIDDEN>
In-Reply-To: <202001011119.001BJMYA027994@HIDDEN>
From: Sergiu Hlihor <sh@HIDDEN>
Date: Wed, 1 Jan 2020 20:06:39 +0100
Message-ID: <CAD-3cdeVbf3TVwFyj7NFd5d5_gTXugTb8_=x9aTjGE4+ufHggQ@HIDDEN>
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
To: arnold@HIDDEN
Content-Type: multipart/alternative; boundary="000000000000204a27059b18c80b"
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--000000000000204a27059b18c80b
Content-Type: text/plain; charset="UTF-8"

Arnold, there is no need to write user code, it is already done in
benchmarks. One of the standard benchmarks when testing HDDs and SSDs is
read throughput vs block size and at different queue depths.  Take a look
at this"
https://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-1TB-ATTO.jpg
. In this benchmark, at queue depth 4 and 128KB block size, the SSD was not
yet able to achieve the maximum throughput 5GB/s. Moreover, if you
extrapolate the results, to a queue depth of 1, you get about ~1.2GB/s out
of over 5GB/s theoretical. Therefore for this particular model you need to
issue read requests at minimum 512KB block size to achieve maximum
throughput. With hard drives I already explained the issue. I have a
production server where the HDD RAID array can do theoretically 2.5GB/s and
I see read speeds over 500MB/s sustained when large block sizes are used
for reads, yet when I use grep, I have a practical bandwidth of 20 to 50
MB/s. Moreover, when it comes to HDDs the math is quite simple and here it
is for a standard HDD at 7200 RPM, 240MB/s:
7200 RPM => 120 revolutions per second
240 MB/s at 120 revolutions => 2MB per revolution
One revolution time  = 1000/120 => 8,33 ms
Read throughput per ms = 240KB

Worst case scenario: each read request requires a full revolution to reach
to the data (head positioning is done concurrently and this can be
ignored).
Seek time: 8.33ms
At 96KB:
 - Read time: 0.4ms
 - Total read latency  = 8.33 + 0.4 = 8.73ms, read throughput  = 1000 /
8.73 * 96KB = 11MB/s
At 512KB:
 - Read time: 2.3ms
 - Total read latency = 8.33 + 2.3 = 10.63ms, read throughput  = 1000 /
10.63 * 512KB = 48MB/s
In practice average seek latencies are 4.16ms so throughput is double. This
is the cold hard reality. In practice, when each one of you is testing, you
are very likely deceived by testing on *one hdd, on an idle system* where
you don't have anything else consuming IO in background like a database. In
such an ideal scenario you do see 240MB/s because HDDs do also read ahead
and by the time the data is transferred over interface and consumed, next
chuck is in the buffer and can be delivered with apparent 0 seek time. This
means first read takes 4ms, next ones takes 0.1ms. With a* HDD RAID array
on a server where your IO is always at 50% load*, if you have a strip size
of 128KB or more, you are hitting one drive at a time, each one with a
penalty of 4.16ms. And due to constant load, by the time you hit the first
hdd again, the read ahead buffer maintained by the HDD itself is also
discarded, so all reads go directly to physical medium. If however you hit
all HDDs at the same time, you will benefit from the read ahead from the
HDD for at least one or more cycles thus having reads with apparent 0
latency and a way higher average bandwidth. The cost of reading from all
HDDs at the same time is a potential of adding extra latencies for all
other applications running, this is why the value should be configurable,
such that best value can be setup based on hardware. The issue of large
block sizes for IO operations is widespread across all tools from Linux,
like rsync or cp and its only getting worse, to an extend where in my
company we are considering writing our own tools for something that should
have worked out of the box. One side issue, which I have to mention as I'm
not aware of implementation details: as we are getting in GB/s territory,
read is best done within it's own thread which then serves the output to
the processing thread. With SSDs that can do multi GB/s this matters.




On Wed, 1 Jan 2020 at 12:19, <arnold@HIDDEN> wrote:

> As a quite serious question, how is someone writing user-level code
> supposed to be able to figure out the right buffer size for a particular
> file, and to do so portably? ("Show me the code.")
>
> Gawk bases its reads on the st_blksize member in struct stat.  That will
> typically be something like 4K - not nearly enough, given your description
> below.
>
> Arnold
>
> Sergiu Hlihor <sh@HIDDEN> wrote:
>
> > This topic is getting more and more frustrating. If you rely on OS, then
> > you are at the mercy of whatever read ahead configuration you have. And
> > read ahead is typically 128KB so does not help that much. A HDD RAID 10
> > array with 12 disks and a strip size of 128KB reaches the maximum read
> > throughput if read block size is 6 * 128 = 768KB. When issuing read
> > requests with 128KB , you only hit one HDD, having 1/6 read throughput.
> > With flash the same. A state of the art SSD that can do 5GB/s reads can
> > actually do around 1GB/s or less at 128KB block size. Why is so hard to
> > understand how hardware works and the fact that you need huge block sizes
> > to actually read at full speed? Why not just exposing the read buffer
> size
> > as a configurable parameter, then anyone can just tune it as needed? 96KB
> > is purely retarded.
> >
> > On Wed, 1 Jan 2020 at 08:52, Paul Eggert <eggert@HIDDEN> wrote:
> >
> > > > This makes me think we should follow Coreutils' lead[0] and increase
> > > > grep's initial buffer size from 32KiB, probably to 128KiB.
> > >
> > > I see that Jim later installed a patch increasing it to 96 KiB.
> > >
> > > Whatever number is chosen, it's "wrong" for some configuration. And I
> > > suppose
> > > the particular configuration that Sergiu Hlihor mentioned could be
> tweaked
> > > so
> > > that it worked better with grep (and with other programs).
> > >
> > > I'm inclined to mark this bug report as a wishlist item, in the sense
> that
> > > it'd
> > > be nice if grep and/or the OS could pick buffer sizes more
> intelligently
> > > (though
> > > it's not clear how grep and/or the OS could go about this).
> > >
>

--000000000000204a27059b18c80b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div>Arnold, there is no need to write us=
er code, it is already done in benchmarks. One of the standard benchmarks w=
hen testing HDDs and SSDs is read throughput vs block size and at different=
 queue depths.=C2=A0 Take a look at this&quot; <a href=3D"https://www.serve=
thehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-1TB-ATTO.jpg">ht=
tps://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-1=
TB-ATTO.jpg</a> . In this benchmark, at queue depth 4 and 128KB block size,=
 the SSD was not yet able to achieve the maximum throughput 5GB/s. Moreover=
, if you extrapolate the results, to a queue depth of 1, you get about ~1.2=
GB/s out of over 5GB/s theoretical. Therefore for this particular model you=
 need to issue read requests at minimum 512KB block size to achieve maximum=
 throughput. With hard drives I already explained the issue. I have a produ=
ction server where the HDD RAID array can do theoretically 2.5GB/s and I se=
e read speeds over 500MB/s sustained when large block sizes are used for re=
ads, yet when I use grep, I have a practical bandwidth of 20 to 50 MB/s. Mo=
reover, when it comes to HDDs the math is quite simple and here it is for a=
 standard HDD at 7200 RPM, 240MB/s:</div><div>7200 RPM =3D&gt; 120 revoluti=
ons per second <br></div><div>240 MB/s at 120 revolutions =3D&gt; 2MB per r=
evolution</div><div>One revolution time=C2=A0 =3D 1000/120 =3D&gt; 8,33 ms<=
/div><div>Read throughput per ms =3D 240KB</div><div><br></div><div>Worst c=
ase scenario: each read request requires a full revolution to reach to the =
data (head positioning is done concurrently and this can be ignored). <br><=
/div><div></div><div>Seek time: 8.33ms</div><div></div><div>At 96KB:<br></d=
iv><div>=C2=A0- Read time: 0.4ms</div><div>=C2=A0- Total read latency=C2=A0=
 =3D 8.33 + 0.4 =3D 8.73ms, read throughput=C2=A0 =3D 1000 / 8.73 * 96KB =
=3D 11MB/s</div><div></div><div>At 512KB:</div><div>=C2=A0- Read time: 2.3m=
s</div><div>=C2=A0- Total read latency =3D 8.33 + 2.3 =3D 10.63ms, read thr=
oughput=C2=A0 =3D 1000 / 10.63 * 512KB =3D 48MB/s</div><div>In practice ave=
rage seek latencies are 4.16ms so throughput is double. This is the cold ha=
rd reality. In practice, when each one of you is testing, you are very like=
ly deceived by testing on <b>one hdd, on an idle system</b> where you don&#=
39;t have anything else consuming IO in background like a database. In such=
 an ideal scenario you do see 240MB/s because HDDs do also read ahead and b=
y the time the data is transferred over interface and consumed, next chuck =
is in the buffer and can be delivered with apparent 0 seek time. This means=
 first read takes 4ms, next ones takes 0.1ms. With a<b> HDD RAID array on a=
 server where your IO is always at 50% load</b>, if you have a strip size o=
f 128KB or more, you are hitting one drive at a time, each one with a penal=
ty of 4.16ms. And due to constant load, by the time you hit the first hdd a=
gain, the read ahead buffer maintained by the HDD itself is also discarded,=
 so all reads go directly to physical medium. If however you hit all HDDs a=
t the same time, you will benefit from the read ahead from the HDD for at l=
east one or more cycles thus having reads with apparent 0 latency and a way=
 higher average bandwidth. The cost of reading from all HDDs at the same ti=
me is a potential of adding extra latencies for all other applications runn=
ing, this is why the value should be configurable, such that best value can=
 be setup based on hardware. The issue of large block sizes for IO operatio=
ns is widespread across all tools from Linux, like rsync or cp and its only=
 getting worse, to an extend where in my company we are considering writing=
 our own tools for something that should have worked out of the box. One si=
de issue, which I have to mention as I&#39;m not aware of implementation de=
tails: as we are getting in GB/s territory, read is best done within it&#39=
;s own thread which then serves the output to the processing thread. With S=
SDs that can do multi GB/s this matters.<br></div><div><br></div><div><br><=
/div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" c=
lass=3D"gmail_attr">On Wed, 1 Jan 2020 at 12:19, &lt;<a href=3D"mailto:arno=
ld@HIDDEN">arnold@HIDDEN</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">As a quite serious question, how is someon=
e writing user-level code<br>
supposed to be able to figure out the right buffer size for a particular<br=
>
file, and to do so portably? (&quot;Show me the code.&quot;)<br>
<br>
Gawk bases its reads on the st_blksize member in struct stat.=C2=A0 That wi=
ll<br>
typically be something like 4K - not nearly enough, given your description<=
br>
below.<br>
<br>
Arnold<br>
<br>
Sergiu Hlihor &lt;<a href=3D"mailto:sh@HIDDEN" target=3D"_blank">sh=
@discovergy.com</a>&gt; wrote:<br>
<br>
&gt; This topic is getting more and more frustrating. If you rely on OS, th=
en<br>
&gt; you are at the mercy of whatever read ahead configuration you have. An=
d<br>
&gt; read ahead is typically 128KB so does not help that much. A HDD RAID 1=
0<br>
&gt; array with 12 disks and a strip size of 128KB reaches the maximum read=
<br>
&gt; throughput if read block size is 6 * 128 =3D 768KB. When issuing read<=
br>
&gt; requests with 128KB , you only hit one HDD, having 1/6 read throughput=
.<br>
&gt; With flash the same. A state of the art SSD that can do 5GB/s reads ca=
n<br>
&gt; actually do around 1GB/s or less at 128KB block size. Why is so hard t=
o<br>
&gt; understand how hardware works and the fact that you need huge block si=
zes<br>
&gt; to actually read at full speed? Why not just exposing the read buffer =
size<br>
&gt; as a configurable parameter, then anyone can just tune it as needed? 9=
6KB<br>
&gt; is purely retarded.<br>
&gt;<br>
&gt; On Wed, 1 Jan 2020 at 08:52, Paul Eggert &lt;<a href=3D"mailto:eggert@=
cs.ucla.edu" target=3D"_blank">eggert@HIDDEN</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt; &gt; This makes me think we should follow Coreutils&#39; lead[0] =
and increase<br>
&gt; &gt; &gt; grep&#39;s initial buffer size from 32KiB, probably to 128Ki=
B.<br>
&gt; &gt;<br>
&gt; &gt; I see that Jim later installed a patch increasing it to 96 KiB.<b=
r>
&gt; &gt;<br>
&gt; &gt; Whatever number is chosen, it&#39;s &quot;wrong&quot; for some co=
nfiguration. And I<br>
&gt; &gt; suppose<br>
&gt; &gt; the particular configuration that Sergiu Hlihor mentioned could b=
e tweaked<br>
&gt; &gt; so<br>
&gt; &gt; that it worked better with grep (and with other programs).<br>
&gt; &gt;<br>
&gt; &gt; I&#39;m inclined to mark this bug report as a wishlist item, in t=
he sense that<br>
&gt; &gt; it&#39;d<br>
&gt; &gt; be nice if grep and/or the OS could pick buffer sizes more intell=
igently<br>
&gt; &gt; (though<br>
&gt; &gt; it&#39;s not clear how grep and/or the OS could go about this).<b=
r>
&gt; &gt;<br>
</blockquote></div></div>

--000000000000204a27059b18c80b--




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 1 Jan 2020 11:27:57 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 06:27:57 2020
Received: from localhost ([127.0.0.1]:35689 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imcAO-00077b-V7
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:27:57 -0500
Received: from lists.gnu.org ([209.51.188.17]:46445)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pj@HIDDEN>) id 1imcAM-00077P-VV
 for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:27:55 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:37966)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <pj@HIDDEN>) id 1imcAL-0007y8-Jv
 for bug-grep@HIDDEN; Wed, 01 Jan 2020 06:27:54 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_LOW,
 URIBL_BLOCKED autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <pj@HIDDEN>) id 1imcAK-0000fx-G8
 for bug-grep@HIDDEN; Wed, 01 Jan 2020 06:27:53 -0500
Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:53503)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <pj@HIDDEN>) id 1imcAK-0000cX-6N
 for bug-grep@HIDDEN; Wed, 01 Jan 2020 06:27:52 -0500
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 2567A44F
 for <bug-grep@HIDDEN>; Wed,  1 Jan 2020 06:27:50 -0500 (EST)
Received: from imap34 ([10.202.2.84])
 by compute1.internal (MEProxy); Wed, 01 Jan 2020 06:27:50 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=content-type:date:from:in-reply-to
 :message-id:mime-version:references:subject:to:x-me-proxy
 :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=3FPE13
 sLv9H+a6dWQRcMgOBbn4EKJMJWiX4CmxajVgQ=; b=jFIOhRXG5TxSZfp8sSbsYf
 atLO6F0EBVwJYVgqpV/PMbFcbDL2NxxGv61We/kSEGFAmWgRqA528MvU6sUnVs8J
 tUU/yq2kUq9SJZy7FfUvbF/mBFZnM5y48hEeE0I60qKPmHxr7Tf1MhLOKeK6Tf+9
 LdVh4fZq+LDjbe5BaJBcteOMUids9+LWeT1wh8J+kyeqKDQc3mSf6KPmGqYcCC1Z
 xlVDjql840uOD33Dc3hNGLwGBYm/6AWbDmRwXArH8EwTQQHfopWf5YdQ5qW64AVL
 mySB2nVL/IFaWGISNxvBNej/1ervduOtlMel4YIJLSH0+BFKRdP1S16dwKeJ8kyQ
 ==
X-ME-Sender: <xms:NYIMXhbhPRcCmdkpGXxnM6RFlGNP8sT50ATVRU7DjQeMQw--sWuJVA>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvdefledgvdejucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd
 ertderredtnecuhfhrohhmpedfrfgruhhlucflrggtkhhsohhnfdcuoehpjhesuhhsrgdr
 nhgvtheqnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpjhesuhhsrgdrnhgvthenucevlh
 hushhtvghrufhiiigvpedt
X-ME-Proxy: <xmx:NYIMXt65eVy_B0YMmKgbR9ZMpE1FfGJx45Mfq72ns2w4jX0V93Pwng>
 <xmx:NYIMXsQulPtDXZFUjlULVZMQ22vjAKn40HcUqhy6qNiWiDVjtK4frA>
 <xmx:NYIMXtvAgvB-piXUe9Bc9YgW4mbhfQ7I9zZRwfGAs8OrOXYtjuEo5Q>
 <xmx:NYIMXqF_ko3gryJj2-oPXJOL3Zz4bgKTGQkQzqkrJMyG2EIbjX2KXg>
Received: by mailuser.nyi.internal (Postfix, from userid 501)
 id 5E5C11460061; Wed,  1 Jan 2020 06:27:49 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.1.7-694-gd5bab98-fmstable-20191218v1
Mime-Version: 1.0
Message-Id: <a59adc1e-64af-44bd-b3aa-8821a7fe354b@HIDDEN>
In-Reply-To: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
Date: Wed, 01 Jan 2020 05:26:04 -0600
From: "Paul Jackson" <pj@HIDDEN>
To: bug-grep@HIDDEN
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
 [fuzzy]
X-Received-From: 64.147.123.25
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

>>  Why not just exposing the read buffer size as a configurable parameter ...

Take a look at the (and I quote) "Hairy buffering mechanism for grep"
input buffering code in the grep source file grep-3.3/src/grep.c, then
you tell me why it's not a runtime variable parameter <grin>.

In other words, the input (and output) i/o buffering and performance
tuning for various situations and kinds of files has been tuned and
refined over many years.  Doing something to the code, such as
making buffer size a run time adjustable parameter, would probably
not be easy, would risk making one usage of grep slower in order
to make some other usage faster, and would risk some nasty bugs.

-- 
                Paul Jackson
                pj@HIDDEN




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 11:19:34 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 06:19:34 2020
Received: from localhost ([127.0.0.1]:35683 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imc2H-0006qy-PG
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:19:34 -0500
Received: from freefriends.org ([96.88.95.60]:44578)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1imc2F-0006qq-G8
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 06:19:32 -0500
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (freefriends.org [96.88.95.60])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 001BJN5u027995
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Wed, 1 Jan 2020 04:19:23 -0700
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 001BJMYA027994;
 Wed, 1 Jan 2020 04:19:22 -0700
From: arnold@HIDDEN
Message-Id: <202001011119.001BJMYA027994@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Wed, 01 Jan 2020 04:19:22 -0700
To: sh@HIDDEN, eggert@HIDDEN
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
 <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
In-Reply-To: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.9 (/)

As a quite serious question, how is someone writing user-level code
supposed to be able to figure out the right buffer size for a particular
file, and to do so portably? ("Show me the code.")

Gawk bases its reads on the st_blksize member in struct stat.  That will
typically be something like 4K - not nearly enough, given your description
below.

Arnold

Sergiu Hlihor <sh@HIDDEN> wrote:

> This topic is getting more and more frustrating. If you rely on OS, then
> you are at the mercy of whatever read ahead configuration you have. And
> read ahead is typically 128KB so does not help that much. A HDD RAID 10
> array with 12 disks and a strip size of 128KB reaches the maximum read
> throughput if read block size is 6 * 128 = 768KB. When issuing read
> requests with 128KB , you only hit one HDD, having 1/6 read throughput.
> With flash the same. A state of the art SSD that can do 5GB/s reads can
> actually do around 1GB/s or less at 128KB block size. Why is so hard to
> understand how hardware works and the fact that you need huge block sizes
> to actually read at full speed? Why not just exposing the read buffer size
> as a configurable parameter, then anyone can just tune it as needed? 96KB
> is purely retarded.
>
> On Wed, 1 Jan 2020 at 08:52, Paul Eggert <eggert@HIDDEN> wrote:
>
> > > This makes me think we should follow Coreutils' lead[0] and increase
> > > grep's initial buffer size from 32KiB, probably to 128KiB.
> >
> > I see that Jim later installed a patch increasing it to 96 KiB.
> >
> > Whatever number is chosen, it's "wrong" for some configuration. And I
> > suppose
> > the particular configuration that Sergiu Hlihor mentioned could be tweaked
> > so
> > that it worked better with grep (and with other programs).
> >
> > I'm inclined to mark this bug report as a wishlist item, in the sense that
> > it'd
> > be nice if grep and/or the OS could pick buffer sizes more intelligently
> > (though
> > it's not clear how grep and/or the OS could go about this).
> >




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 09:15:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 04:15:37 2020
Received: from localhost ([127.0.0.1]:35621 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ima6K-0003yV-Uc
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 04:15:37 -0500
Received: from mail-io1-f50.google.com ([209.85.166.50]:34884)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sh@HIDDEN>) id 1ima6I-0003yF-QY
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 04:15:36 -0500
Received: by mail-io1-f50.google.com with SMTP id v18so35842758iol.2
 for <32073 <at> debbugs.gnu.org>; Wed, 01 Jan 2020 01:15:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=discovergy-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=8gHSYNXJGtNZ3y8e9Nw8xBL74BT4jrQGosamVRHAPwE=;
 b=RHY9/EZNzOLQbXdM8yE7Em+XryBaYVTCsL6kzppIApUrQBPapaZIJ+YLRTFKJayFQ5
 zmrCvfR4WiNuREW6XOV3bU590JIE3dcFucwcjYuHFQRB3vsA7728et+Xkxfz3I+JinAj
 kUWosCOKB+hgpJLZfYI5V/GS3pE6lgfqgDmYtR0ywh4e7yMcdCV7ar1YzcggMSnC0qjl
 41d03g7n5dWawEmvqedFvgX0njyaojVViK7++X+q43XLrSvMC2GzLay8RHiLdE+BLir9
 1jxuH13Y+oBsiqwA+wk5X/cxjdqXbvYm685Yyr0QIjaxIVx4ScPaJPZ0AmWLG7Lj6/7W
 yf6Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=8gHSYNXJGtNZ3y8e9Nw8xBL74BT4jrQGosamVRHAPwE=;
 b=a3vUjVYpkkDSnTuJg6qU2hPm2rzPsoe28rpKH9L15sRe78vtWhFXzOxdZgb8kchlH+
 4fRWCRRUR4pjAMNGUq1o+ZRG4N06jmApjKC3b3lafFsk5VIb6S+or5V+xIljQcLwF9EF
 kw1jnOf3gs4qjTFOG7LZHcWY8mtgmef01YYJ4fhj4AwhkY2lJRdoaorZnf8xS4H8/s83
 pppQgvZCmA5J8QSKcnMLaU2/80k2rAvVjwa+vB5gABKR6c8pGXxzVxysyUb4DkyZPtR3
 ww4/tJljbviR29fqVNTARspTgTpLGWwhbuuhKx1ZdFF+aisvS/Z3kN+LQQfbxIIaEOY6
 llBQ==
X-Gm-Message-State: APjAAAVudOcOVwHsECbWShBB7sRiFc0qJqC5DetlwIBP1zWr62tZiCBq
 A9No8KAFnKOj9/qQRQabPR7sLP3AdHjtW0WrYIRhYA==
X-Google-Smtp-Source: APXvYqyqv9lqodjCG3Jkez2qtpMRbIhjIlGYB6bhFu6so69gdZ5KPSB8/uvgzj4GZl/qdlbNcZmgcB8fLcP/2+wj+38=
X-Received: by 2002:a02:864b:: with SMTP id e69mr58953496jai.83.1577870129071; 
 Wed, 01 Jan 2020 01:15:29 -0800 (PST)
MIME-Version: 1.0
References: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
In-Reply-To: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
From: Sergiu Hlihor <sh@HIDDEN>
Date: Wed, 1 Jan 2020 10:15:16 +0100
Message-ID: <CAD-3cdd_r=fV0L2Pw8hQMZAWSot3M12bvR93LY4m7zoaCXijtg@HIDDEN>
Subject: Re: Improvements in Grep (Bug#32073)
To: Paul Eggert <eggert@HIDDEN>
Content-Type: multipart/alternative; boundary="0000000000008b9f5e059b1084ee"
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>,
 Jim Meyering <jim@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--0000000000008b9f5e059b1084ee
Content-Type: text/plain; charset="UTF-8"

This topic is getting more and more frustrating. If you rely on OS, then
you are at the mercy of whatever read ahead configuration you have. And
read ahead is typically 128KB so does not help that much. A HDD RAID 10
array with 12 disks and a strip size of 128KB reaches the maximum read
throughput if read block size is 6 * 128 = 768KB. When issuing read
requests with 128KB , you only hit one HDD, having 1/6 read throughput.
With flash the same. A state of the art SSD that can do 5GB/s reads can
actually do around 1GB/s or less at 128KB block size. Why is so hard to
understand how hardware works and the fact that you need huge block sizes
to actually read at full speed? Why not just exposing the read buffer size
as a configurable parameter, then anyone can just tune it as needed? 96KB
is purely retarded.

On Wed, 1 Jan 2020 at 08:52, Paul Eggert <eggert@HIDDEN> wrote:

> > This makes me think we should follow Coreutils' lead[0] and increase
> > grep's initial buffer size from 32KiB, probably to 128KiB.
>
> I see that Jim later installed a patch increasing it to 96 KiB.
>
> Whatever number is chosen, it's "wrong" for some configuration. And I
> suppose
> the particular configuration that Sergiu Hlihor mentioned could be tweaked
> so
> that it worked better with grep (and with other programs).
>
> I'm inclined to mark this bug report as a wishlist item, in the sense that
> it'd
> be nice if grep and/or the OS could pick buffer sizes more intelligently
> (though
> it's not clear how grep and/or the OS could go about this).
>

--0000000000008b9f5e059b1084ee
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>This topic is getting more and more frustrating. If y=
ou rely on OS, then you are at the mercy of whatever read ahead configurati=
on you have. And read ahead is typically 128KB so does not help that much. =
A HDD RAID 10 array with 12 disks and a strip size of 128KB reaches the max=
imum read throughput if read block size is 6 * 128 =3D 768KB. When issuing =
read requests with 128KB , you only hit one HDD, having 1/6 read throughput=
. With flash the same. A state of the art SSD that can do 5GB/s reads can a=
ctually do around 1GB/s or less at 128KB block size. Why is so hard to unde=
rstand how hardware works and the fact that you need huge block sizes to ac=
tually read at full speed? Why not just exposing the read buffer size as a =
configurable parameter, then anyone can just tune it as needed? 96KB is pur=
ely retarded.<br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" clas=
s=3D"gmail_attr">On Wed, 1 Jan 2020 at 08:52, Paul Eggert &lt;<a href=3D"ma=
ilto:eggert@HIDDEN">eggert@HIDDEN</a>&gt; wrote:<br></div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1p=
x solid rgb(204,204,204);padding-left:1ex">&gt; This makes me think we shou=
ld follow Coreutils&#39; lead[0] and increase<br>
&gt; grep&#39;s initial buffer size from 32KiB, probably to 128KiB.<br>
<br>
I see that Jim later installed a patch increasing it to 96 KiB.<br>
<br>
Whatever number is chosen, it&#39;s &quot;wrong&quot; for some configuratio=
n. And I suppose<br>
the particular configuration that Sergiu Hlihor mentioned could be tweaked =
so<br>
that it worked better with grep (and with other programs).<br>
<br>
I&#39;m inclined to mark this bug report as a wishlist item, in the sense t=
hat it&#39;d<br>
be nice if grep and/or the OS could pick buffer sizes more intelligently (t=
hough<br>
it&#39;s not clear how grep and/or the OS could go about this).<br>
</blockquote></div><br clear=3D"all"><br><br></div>

--0000000000008b9f5e059b1084ee--




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 1 Jan 2020 07:53:03 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 01 02:53:03 2020
Received: from localhost ([127.0.0.1]:35593 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imYoR-000097-DS
	for submit <at> debbugs.gnu.org; Wed, 01 Jan 2020 02:53:03 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49318)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1imYoO-00008c-SA
 for 32073 <at> debbugs.gnu.org; Wed, 01 Jan 2020 02:53:01 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 4988716008F;
 Tue, 31 Dec 2019 23:52:55 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id Gxmu9XNl4O-w; Tue, 31 Dec 2019 23:52:54 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9C3A716022A;
 Tue, 31 Dec 2019 23:52:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id EswyY5VL8zaA; Tue, 31 Dec 2019 23:52:54 -0800 (PST)
Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com
 [23.242.74.103])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 6D60516008F;
 Tue, 31 Dec 2019 23:52:54 -0800 (PST)
To: Sergiu Hlihor <sh@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Subject: Re: Improvements in Grep (Bug#32073)
Message-ID: <5608aabb-ae0e-38e0-8c26-443f764cb53a@HIDDEN>
Date: Tue, 31 Dec 2019 23:52:54 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.2.2
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org, Dennis Clarke <dclarke@HIDDEN>,
 Jim Meyering <jim@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> This makes me think we should follow Coreutils' lead[0] and increase
> grep's initial buffer size from 32KiB, probably to 128KiB.

I see that Jim later installed a patch increasing it to 96 KiB.

Whatever number is chosen, it's "wrong" for some configuration. And I suppose
the particular configuration that Sergiu Hlihor mentioned could be tweaked so
that it worked better with grep (and with other programs).

I'm inclined to mark this bug report as a wishlist item, in the sense that it'd
be nice if grep and/or the OS could pick buffer sizes more intelligently (though
it's not clear how grep and/or the OS could go about this).




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 7 Jul 2018 01:39:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 21:39:13 2018
Received: from localhost ([127.0.0.1]:48957 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fbcBs-0000ed-OU
	for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 21:39:13 -0400
Received: from mail-it0-f49.google.com ([209.85.214.49]:54285)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sh@HIDDEN>) id 1fbc4L-0000Sc-O0
 for 32073 <at> debbugs.gnu.org; Fri, 06 Jul 2018 21:31:26 -0400
Received: by mail-it0-f49.google.com with SMTP id s7-v6so18707912itb.4
 for <32073 <at> debbugs.gnu.org>; Fri, 06 Jul 2018 18:31:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=discovergy-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=moB9xXnht8Ivaa6H7SGXQS7xXcWeF8ztpA7QCKa2BT4=;
 b=bLqNZb71KPy2mG0vNyuyHEeRYm904p/g6KRsezoGV7fzUqdmYb+kf9BhNAAL2b3uNX
 EZS7Mkdk+wtgo787UcgZCPdzLsgB4Xx4XWz6+DdEV7GlXKDzCciLV+7xZf8CLThTVsqO
 ANycURMEcfIb8XOOKkywhequHiDPzuGjA+mCL8XbTQ85KlCtIy6Wi9m/UaH3DbF6MpQf
 m+iyBtopRtUMcO5vwaLX8jA5Z5mqzvW1z7TQrgzeOR6X0WaWp3964Rn0uRW3JU4i+nOR
 SUfDDvlxAM9Uv5rcCH6QXFHSKysTf6GQLABCezImn7rNgnnu0DfYsJlCAepPO/3DSwyB
 2ajg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=moB9xXnht8Ivaa6H7SGXQS7xXcWeF8ztpA7QCKa2BT4=;
 b=TJo1fVsRWbp+azwzaVnH3qm9j43mqh06Jr6+A8x3WInMKafRmapHzxVSZe0wzqvWki
 0BHrtABV03cUduLLrIAF7VuPO0JhbHPM1z/DW2MxrpbHcbdYc36CkcZ8w4anA9Ugdhy3
 EFm07/0b5RWnq1A3UFDn/hkcc+jl+vx4NguDzsq2vr/pNcb65hBiVMFu5IAgsda6X6jE
 5+KZz2OCcVXGBE18HKL5qXIIc+nQwy0shI3h/qVo4f4ccpy2rgk8jlhv9pW7g9/and0T
 kjMlgfihEsLTcKmyR7DTE3K+pwP9YRZXTXgj7eaWhds+NDExYxO3CW9tcHMFIo1+PVt7
 /sHA==
X-Gm-Message-State: APt69E25xAaOkhKgd8peujqWEJOl4JV0PGYTKYKkK2Fc+0CgdTE5G00N
 74Sf2u9htI5ECihtHlTcygvRb72EZZ9LiVTCX+1yyw==
X-Google-Smtp-Source: AAOMgpcEMTNY6K71KFEu+OBwvA3lpDLV0oMhfUmxaofiJR2wF4p/DaEshJq7+Vz+4+8CU1NGhLNdV/5thsc9LKdvNhY=
X-Received: by 2002:a24:cf57:: with SMTP id
 y84-v6mr10031863itf.98.1530927080155; 
 Fri, 06 Jul 2018 18:31:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a02:1b98:0:0:0:0:0 with HTTP;
 Fri, 6 Jul 2018 18:31:19 -0700 (PDT)
In-Reply-To: <CA+8g5KFkFjPKLKLAeu8EiiU+pKsu89VKsvbRzc94_0xGShadZA@HIDDEN>
References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
 <CA+8g5KFkFjPKLKLAeu8EiiU+pKsu89VKsvbRzc94_0xGShadZA@HIDDEN>
From: Sergiu Hlihor <sh@HIDDEN>
Date: Sat, 7 Jul 2018 03:31:19 +0200
Message-ID: <CAD-3cdf6upYf6NjgFTZGHXbz6b-e6wCw+1A=LT8VMZxnK5q-6w@HIDDEN>
Subject: Re: bug#32073: Improvements in Grep
To: Jim Meyering <jim@HIDDEN>
Content-Type: multipart/alternative; boundary="000000000000ca462d05705ebc23"
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 32073
X-Mailman-Approved-At: Fri, 06 Jul 2018 21:39:11 -0400
Cc: 32073 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--000000000000ca462d05705ebc23
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

To add, the increase to 128KiB is good, but for RAID arrays with light to
medium load, this is not sufficient. In a system without any load, the HDD
can read ahead and always serve the next request from buffer thus reading
at full sequential speed of ~200MB/s . In a RAID 10 configuration with 12
hdds where strip size is set to 128KB, every HDD is hit at every 6th
request. There is enough delay between reads hitting the same drive that
the read ahead buffer often gets discarded which basically limits the
throughput to max IOPS x buffer size  =3D ~10-20MiB for 128KiB.
I have such systems in production environments and I often see read speeds
under 10MiB and read await >10ms which means that read ahead buffer is
already discarded. At the same load conditions, if I read the data using
utilities which can do 512KiB buffer size, I see read speed varying between
50 and 400MiB. Grep has an average CPU load of 2-3% of the given machine
under such low reads, therefore it can do much more if reading is optimized=
.

On 7 July 2018 at 02:33, Jim Meyering <jim@HIDDEN> wrote:

> On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <sh@HIDDEN> wrote:
> > Hello,
> >      I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While
> > grepping over large files I've noticed Grep is painfully slow. The
> > bottleneck seems to be the read block which is extremely low (looks lik=
e
> > 64KB). For large files residing over big HDD RAID arrays, this request
> > barely reaches one drive and based on CPU usage, grep is idling more or
> > less. Given my tests for such scenarios, a read block size of at least
> > 512KB would be way more efficient. It's very likely that optimum would =
be
> > 1MB+. Also, such increase in buffer size would also benefit slightly SS=
Ds
> > where maximum sequential throughput is usually achieved when reading at
> > 256KB+ block size.
> >      If this is already possible in newer versions or configurable, I'd
> > appreciate some hints about the new version which contains or about the
> way
> > I can configure it to increase the read block size.
>
> Thanks for raising the issue.
> This makes me think we should follow Coreutils' lead[0] and increase
> grep's initial buffer size from 32KiB, probably to 128KiB. I will time
> with the attached diff on a few systems.
>
> [0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=3D
> v8.22-103-g74ca6e84c
>



--=20
_____________________________________________

Senior Software Engineer & Team leader

Telefon: +49 (0) 6221 7787-481

Email: sh@HIDDEN

*Discovergy GmbH*
_____________________________________________

Registergericht: Amtsgericht Aachen HRB 15391

Gesch=C3=A4ftsf=C3=BChrer: Ralf Esser | Bernhard Seidl | Nikolaus Starzache=
r
Diese E-Mail und eventuell angeh=C3=A4ngte Dateien sind nur f=C3=BCr den ob=
en
genannten Empf=C3=A4nger bestimmt und k=C3=B6nnen vertrauliche Informatione=
n
enthalten. Sollten Sie nicht der Empf=C3=A4nger sein, ist jede Verbreitung,
Weiterleitung und Kopie verboten. Wenn Sie diese E-Mail versehentlich
erhalten haben, senden Sie diese Mail zur=C3=BCck oder unterrichten umgehen=
d den
Absender unter oben genannten Kontaktdaten. Bitte l=C3=B6schen Sie diese
Nachricht in diesem Fall umgehend. Vielen Dank.

--000000000000ca462d05705ebc23
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>To add, the increase to 128KiB is good, but for RAID =
arrays with light to medium load, this is not sufficient. In a system witho=
ut any load, the HDD can read ahead and always serve the next request from =
buffer thus reading at full sequential speed of ~200MB/s . In a RAID 10 con=
figuration with 12 hdds where strip size is set to 128KB, every HDD is hit =
at every 6th request. There is enough delay between reads hitting the same =
drive that the read ahead buffer often gets discarded which basically limit=
s the throughput to max IOPS x buffer size=C2=A0 =3D ~10-20MiB for 128KiB. =
=C2=A0 <br></div><div>I have such systems in production environments and I =
often see read speeds under 10MiB and read await &gt;10ms which means that =
read ahead buffer is already discarded. At the same load conditions, if I r=
ead the data using utilities which can do 512KiB buffer size, I see read sp=
eed varying between 50 and 400MiB. Grep has an average CPU load of 2-3% of =
the given machine under such low reads, therefore it can do much more if re=
ading is optimized.<br> </div></div><div class=3D"gmail_extra"><br><div cla=
ss=3D"gmail_quote">On 7 July 2018 at 02:33, Jim Meyering <span dir=3D"ltr">=
&lt;<a href=3D"mailto:jim@HIDDEN" target=3D"_blank">jim@HIDDEN<=
/a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:=
0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Fri, Jul 6, 2018=
 at 9:26 AM, Sergiu Hlihor &lt;<a href=3D"mailto:sh@HIDDEN">sh@disc=
overgy.com</a>&gt; wrote:<br>
&gt; Hello,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 I&#39;m using grep over Ubuntu Server 14.04 (Grep =
version 2.16). While<br>
&gt; grepping over large files I&#39;ve noticed Grep is painfully slow. The=
<br>
&gt; bottleneck seems to be the read block which is extremely low (looks li=
ke<br>
&gt; 64KB). For large files residing over big HDD RAID arrays, this request=
<br>
&gt; barely reaches one drive and based on CPU usage, grep is idling more o=
r<br>
&gt; less. Given my tests for such scenarios, a read block size of at least=
<br>
&gt; 512KB would be way more efficient. It&#39;s very likely that optimum w=
ould be<br>
&gt; 1MB+. Also, such increase in buffer size would also benefit slightly S=
SDs<br>
&gt; where maximum sequential throughput is usually achieved when reading a=
t<br>
&gt; 256KB+ block size.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 If this is already possible in newer versions or c=
onfigurable, I&#39;d<br>
&gt; appreciate some hints about the new version which contains or about th=
e way<br>
&gt; I can configure it to increase the read block size.<br>
<br>
Thanks for raising the issue.<br>
This makes me think we should follow Coreutils&#39; lead[0] and increase<br=
>
grep&#39;s initial buffer size from 32KiB, probably to 128KiB. I will time<=
br>
with the attached diff on a few systems.<br>
<br>
[0] <a href=3D"https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=
=3Dv8.22-103-g74ca6e84c" rel=3D"noreferrer" target=3D"_blank">https://git.s=
avannah.gnu.org/<wbr>cgit/coreutils.git/commit/?id=3D<wbr>v8.22-103-g74ca6e=
84c</a><br>
</blockquote></div><br><br clear=3D"all"><br>-- <br><div class=3D"gmail_sig=
nature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">=
<div><div dir=3D"ltr">_____________________________________________<br><br>=
Senior Software Engineer &amp; Team leader<br><br>Telefon: +49 (0) 6221 778=
7-481<br>
<br>
Email: <a href=3D"mailto:sh@HIDDEN" target=3D"_blank"><span>sh@disc=
overgy.com</span></a><br><br>
<b><span style=3D"font-size:10.0pt;font-family:&quot;Arial&quot;,sans-serif=
;color:#00b050">Discovergy GmbH</span></b><br>_____________________________=
________________<br><p style=3D"margin-right:0cm;margin-bottom:7.2pt;margin=
-left:0cm;background:white;vertical-align:middle"><span style=3D"font-size:=
7.5pt;font-family:&quot;Arial&quot;,sans-serif;color:#707173">Registergeric=
ht: Amtsgericht Aachen HRB 15391</span><span style=3D"font-size:7.0pt;font-=
family:&quot;Arial&quot;,sans-serif;color:#222222"></span></p><p style=3D"m=
argin-right:0cm;margin-bottom:4.8pt;margin-left:0cm"><span style=3D"font-si=
ze:7.5pt;font-family:&quot;Arial&quot;,sans-serif;color:#707173">Gesch=C3=
=A4ftsf=C3=BChrer: Ralf Esser | Bernhard Seidl | Nikolaus Starzacher</span>=
</p><span style=3D"font-size:18.0pt;font-family:Webdings;color:#00b050"></s=
pan><span style=3D"font-size:10.0pt;font-family:&quot;Arial&quot;,sans-seri=
f;color:#00b050"></span><span style=3D"font-size:8.0pt;font-family:&quot;Ar=
ial&quot;,&quot;sans-serif&quot;;color:#5f5f5f">Diese
 E-Mail und eventuell angeh=C3=A4ngte Dateien sind nur f=C3=BCr den oben ge=
nannten
 Empf=C3=A4nger bestimmt und k=C3=B6nnen vertrauliche Informationen enthalt=
en.=20
Sollten Sie nicht der Empf=C3=A4nger sein, ist jede Verbreitung,=20
Weiterleitung und Kopie verboten. Wenn Sie diese E-Mail versehentlich=20
erhalten haben, senden Sie diese Mail zur=C3=BCck oder unterrichten umgehen=
d=20
den Absender unter oben genannten Kontaktdaten. Bitte l=C3=B6schen Sie dies=
e=20
Nachricht in diesem Fall umgehend. Vielen Dank.</span><br></div></div></div=
></div></div></div></div></div></div></div></div></div>
</div>

--000000000000ca462d05705ebc23--




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 7 Jul 2018 00:33:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 20:33:37 2018
Received: from localhost ([127.0.0.1]:48940 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fbbAO-0007QZ-Ov
	for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 20:33:36 -0400
Received: from mail-wm0-f53.google.com ([74.125.82.53]:55493)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1fbbAN-0007QL-2l
 for 32073 <at> debbugs.gnu.org; Fri, 06 Jul 2018 20:33:35 -0400
Received: by mail-wm0-f53.google.com with SMTP id v16-v6so16251135wmv.5
 for <32073 <at> debbugs.gnu.org>; Fri, 06 Jul 2018 17:33:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=Momz89FF7eOSa9Kgl6zghjC6oTftuhwl6RqoRaYOrMQ=;
 b=r5rTENOu+tSjTcdqBYCTdtODWfnwh8JkbnU5pvQQ4FKW1s8iXv2g5OCYUXzRzk8kV8
 ODH33BK+AfAPwfMkzWeLWw5OCKnQNSHJIYsWa+w0kKz8gZobrqPJd8ed9itA2EkVtV5A
 iAXB+K+Pp/PxIRqXOJxVxKGnPNRuni/9L5iOidz9IVeVZwpsPFjhFNJVl9NBrwJu7s2d
 GjRngOzfufM+djuCXS4i5EEa6fucjxJz+8MVxCCaFNyLqOXfn3EezAAJYTsryJNZhmp7
 kv4tiCYsRY7KlQw/J0XHBVZscFUrmBt+BWDTVOT2D/OJiVyNdnH/98srPilODCKG/+po
 w0aA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=Momz89FF7eOSa9Kgl6zghjC6oTftuhwl6RqoRaYOrMQ=;
 b=D7WGy8TmAKpGlKrGsWtuPnKctHvaGrSGSoOvwgTpinuF9fzQnCbdkZ02dem92FhTwk
 9ZGVfQuziRmvkc+VtxmXgv2FHObUpiW6RemoYLyxVNALbzXEJ572OG+/EE4py8kkSIaq
 PxI4NjXPPU+/L+w0hj/BScRtR4JQV23yOMc+61zQVoJZlOHojpssXYoxpwoeUS0G7F0w
 jQt/H5Qir8CGhByO5fizJhE+yKpo/9tVSpGaCs9xlg5SRimXpPjtXSHlZljpZeQhTsB9
 7O1o7AsL0OGykTSOw8LVIrVoyTs2tJf+4WJzeZJaHPWGT7OY1dNW/ez2hQMED5Ky46z8
 8JlA==
X-Gm-Message-State: APt69E19y1bXDLv/AKiy97SERJPdA6/AVTIL7VXFPtIKgaw5r1Zx0uXA
 ui2EpCHA0XU0VxPqisyeegH6khO3gnlNFifrEhDg0g==
X-Google-Smtp-Source: AAOMgpcRnfie1Sy2x6piy4b8g+uuIEn+uAsPVNH+5N8Gv+JNGmAcwNSJDDj6rqaUSE7U/1OBWoMbu4BPlT2a8/4Shew=
X-Received: by 2002:a1c:a8f:: with SMTP id
 137-v6mr6676449wmk.119.1530923609175; 
 Fri, 06 Jul 2018 17:33:29 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:adf:ec4e:0:0:0:0:0 with HTTP;
 Fri, 6 Jul 2018 17:33:08 -0700 (PDT)
In-Reply-To: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Fri, 6 Jul 2018 17:33:08 -0700
X-Google-Sender-Auth: tlltqOQ-2sHQZaEuW_K-9CvgtBM
Message-ID: <CA+8g5KFkFjPKLKLAeu8EiiU+pKsu89VKsvbRzc94_0xGShadZA@HIDDEN>
Subject: Re: bug#32073: Improvements in Grep
To: Sergiu Hlihor <sh@HIDDEN>
Content-Type: multipart/mixed; boundary="000000000000e75f1605705ded47"
X-Spam-Score: 0.5 (/)
X-Debbugs-Envelope-To: 32073
Cc: 32073 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.5 (/)

--000000000000e75f1605705ded47
Content-Type: text/plain; charset="UTF-8"

On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <sh@HIDDEN> wrote:
> Hello,
>      I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While
> grepping over large files I've noticed Grep is painfully slow. The
> bottleneck seems to be the read block which is extremely low (looks like
> 64KB). For large files residing over big HDD RAID arrays, this request
> barely reaches one drive and based on CPU usage, grep is idling more or
> less. Given my tests for such scenarios, a read block size of at least
> 512KB would be way more efficient. It's very likely that optimum would be
> 1MB+. Also, such increase in buffer size would also benefit slightly SSDs
> where maximum sequential throughput is usually achieved when reading at
> 256KB+ block size.
>      If this is already possible in newer versions or configurable, I'd
> appreciate some hints about the new version which contains or about the way
> I can configure it to increase the read block size.

Thanks for raising the issue.
This makes me think we should follow Coreutils' lead[0] and increase
grep's initial buffer size from 32KiB, probably to 128KiB. I will time
with the attached diff on a few systems.

[0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=v8.22-103-g74ca6e84c

--000000000000e75f1605705ded47
Content-Type: application/octet-stream; name="grep-bufsize-increase.diff"
Content-Disposition: attachment; filename="grep-bufsize-increase.diff"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_jjaoc07a0

ZGlmZiAtLWdpdCBhL3NyYy9ncmVwLmMgYi9zcmMvZ3JlcC5jCmluZGV4IGY0YWU1ZjUuLjA0YWM5
YzkgMTAwNjQ0Ci0tLSBhL3NyYy9ncmVwLmMKKysrIGIvc3JjL2dyZXAuYwpAQCAtNzk5LDcgKzc5
OSw2IEBAIHNraXBwZWRfZmlsZSAoY2hhciBjb25zdCAqbmFtZSwgYm9vbCBjb21tYW5kX2xpbmUs
IGJvb2wgaXNfZGlyKQoKIHN0YXRpYyBjaGFyICpidWZmZXI7CQkvKiBCYXNlIG9mIGJ1ZmZlci4g
Ki8KIHN0YXRpYyBzaXplX3QgYnVmYWxsb2M7CQkvKiBBbGxvY2F0ZWQgYnVmZmVyIHNpemUsIGNv
dW50aW5nIHNsb3AuICovCi1lbnVtIHsgSU5JVElBTF9CVUZTSVpFID0gMzI3NjggfTsgLyogSW5p
dGlhbCBidWZmZXIgc2l6ZSwgbm90IGNvdW50aW5nIHNsb3AuICovCiBzdGF0aWMgaW50IGJ1ZmRl
c2M7CQkvKiBGaWxlIGRlc2NyaXB0b3IuICovCiBzdGF0aWMgY2hhciAqYnVmYmVnOwkJLyogQmVn
aW5uaW5nIG9mIHVzZXItdmlzaWJsZSBzdHVmZi4gKi8KIHN0YXRpYyBjaGFyICpidWZsaW07CQkv
KiBMaW1pdCBvZiB1c2VyLXZpc2libGUgc3R1ZmYuICovCkBAIC04MTIsNiArODExLDkgQEAgc3Rh
dGljIGJvb2wgc2tpcF9udWxzOwkJLyogU2tpcCAnXDAnIGluIGRhdGEuICAqLwogc3RhdGljIGJv
b2wgc2tpcF9lbXB0eV9saW5lczsJLyogU2tpcCBlbXB0eSBsaW5lcyBpbiBkYXRhLiAgKi8KIHN0
YXRpYyB1aW50bWF4X3QgdG90YWxubDsJLyogVG90YWwgbmV3bGluZSBjb3VudCBiZWZvcmUgbGFz
dG5sLiAqLwoKKy8qIEluaXRpYWwgYnVmZmVyIHNpemUsIG5vdCBjb3VudGluZyBzbG9wLiAqLwor
ZW51bSB7IElOSVRJQUxfQlVGU0laRSA9IDEyOCAqIDEwMjQgfTsKKwogLyogUmV0dXJuIFZBTCBh
bGlnbmVkIHRvIHRoZSBuZXh0IG11bHRpcGxlIG9mIEFMSUdOTUVOVC4gIFZBTCBjYW4gYmUKICAg
IGFuIGludGVnZXIgb3IgYSBwb2ludGVyLiAgQm90aCBhcmdzIG11c3QgYmUgZnJlZSBvZiBzaWRl
IGVmZmVjdHMuICAqLwogI2RlZmluZSBBTElHTl9UTyh2YWwsIGFsaWdubWVudCkgXAo=
--000000000000e75f1605705ded47--




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 6 Jul 2018 22:44:56 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 18:44:55 2018
Received: from localhost ([127.0.0.1]:48900 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fbZTD-0004NL-Kt
	for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:55 -0400
Received: from eggs.gnu.org ([208.118.235.92]:52864)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dclarke@HIDDEN>) id 1fbZTB-0004N6-Nb
 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <dclarke@HIDDEN>) id 1fbZT5-0002Kd-Ui
 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:48 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:42426)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <dclarke@HIDDEN>)
 id 1fbZT5-0002KZ-Qk
 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:44:47 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43835)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <dclarke@HIDDEN>) id 1fbZT4-0003Yf-RL
 for bug-grep@HIDDEN; Fri, 06 Jul 2018 18:44:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <dclarke@HIDDEN>) id 1fbZT1-0002KB-Pa
 for bug-grep@HIDDEN; Fri, 06 Jul 2018 18:44:46 -0400
Received: from atl4mhob08.registeredsite.com ([209.17.115.46]:55668)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <dclarke@HIDDEN>)
 id 1fbZT1-0002Jr-K1
 for bug-grep@HIDDEN; Fri, 06 Jul 2018 18:44:43 -0400
Received: from mailpod.hostingplatform.com
 (atl4qobmail01pod2.registeredsite.com [10.30.77.35])
 by atl4mhob08.registeredsite.com (8.14.4/8.14.4) with ESMTP id w66Micxx011705
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL)
 for <bug-grep@HIDDEN>; Fri, 6 Jul 2018 18:44:38 -0400
Received: (qmail 26434 invoked by uid 0); 6 Jul 2018 22:44:37 -0000
X-TCPREMOTEIP: 99.253.103.29
X-Authenticated-UID: dclarke@HIDDEN
Received: from unknown (HELO sedna.genunix.com)
 (dclarke@HIDDEN@99.253.103.29)
 by 0 with ESMTPA; 6 Jul 2018 22:44:37 -0000
Subject: Re: bug#32073: Improvements in Grep
To: bug-grep@HIDDEN
References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
 <9be5ca5d-dc30-508f-649b-5146ee85cf5e@HIDDEN>
From: Dennis Clarke <dclarke@HIDDEN>
Message-ID: <d2b7c614-4be5-167e-fce0-3e27d9ce5771@HIDDEN>
Date: Fri, 6 Jul 2018 18:44:36 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <9be5ca5d-dc30-508f-649b-5146ee85cf5e@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

On 07/06/2018 06:06 PM, Paul Eggert wrote:
> Sergiu Hlihor wrote:
>> Given my tests for such scenarios, a read block size of at least
>> 512KB would be way more efficient.
> 
> Does stdio do this already? If not, why not? How could grep reasonably 
> configure a good block size?

This seems to be a very specific complaint which is only of value on a
very specific system and usage case.  There is no way that grep could
configure a "good block size" unless it were tailor built.  Doesn't
seem to be a reasonable RFE.  In my opinion.

Dennis




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at 32073 <at> debbugs.gnu.org:


Received: (at 32073) by debbugs.gnu.org; 6 Jul 2018 22:06:44 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 18:06:44 2018
Received: from localhost ([127.0.0.1]:48878 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fbYsF-0003J2-W1
	for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:06:44 -0400
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:33666)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1fbYsD-0003Ij-LC
 for 32073 <at> debbugs.gnu.org; Fri, 06 Jul 2018 18:06:42 -0400
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9660D16161F;
 Fri,  6 Jul 2018 15:06:35 -0700 (PDT)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id t3uCubr3XabO; Fri,  6 Jul 2018 15:06:34 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id E1759161625;
 Fri,  6 Jul 2018 15:06:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id nlyZyupbRODd; Fri,  6 Jul 2018 15:06:34 -0700 (PDT)
Received: from [192.168.1.9] (unknown [47.154.30.119])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A232616161F;
 Fri,  6 Jul 2018 15:06:34 -0700 (PDT)
Subject: Re: bug#32073: Improvements in Grep
To: Sergiu Hlihor <sh@HIDDEN>, 32073 <at> debbugs.gnu.org
References: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Openpgp: preference=signencrypt
Autocrypt: addr=eggert@HIDDEN; prefer-encrypt=mutual; keydata=
 xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9
 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS
 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl
 KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss
 mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK
 QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2
 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC
 OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+
 KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB
 n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy
 dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG
 FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t
 nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr
 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD
 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH
 s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY
 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu
 oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt
 N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl
 +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu
 ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc
 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6
 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/
 IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r
 Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r
 FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO
 gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6
 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA
 nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA
 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G
 IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC
 AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs
 T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF
 /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO
 FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD
 LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx
 c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc
 XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu
 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB
 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0
 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b
 uNcby4U=
Organization: UCLA Computer Science Department
Message-ID: <9be5ca5d-dc30-508f-649b-5146ee85cf5e@HIDDEN>
Date: Fri, 6 Jul 2018 15:06:34 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 32073
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Sergiu Hlihor wrote:
> Given my tests for such scenarios, a read block size of at least
> 512KB would be way more efficient.

Does stdio do this already? If not, why not? How could grep reasonably configure 
a good block size?




Information forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 6 Jul 2018 21:31:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 06 17:31:49 2018
Received: from localhost ([127.0.0.1]:48863 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fbYKS-0002MD-DA
	for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 17:31:49 -0400
Received: from eggs.gnu.org ([208.118.235.92]:49666)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sh@HIDDEN>) id 1fbTYx-0003J9-SL
 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 12:26:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <sh@HIDDEN>) id 1fbTYr-000371-NQ
 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 12:26:22 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,HTML_MESSAGE,
 T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:37207)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYr-00036v-K4
 for submit <at> debbugs.gnu.org; Fri, 06 Jul 2018 12:26:21 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40630)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <sh@HIDDEN>) id 1fbTYq-0001Jy-E2
 for bug-grep@HIDDEN; Fri, 06 Jul 2018 12:26:21 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <sh@HIDDEN>) id 1fbTYp-00035W-EW
 for bug-grep@HIDDEN; Fri, 06 Jul 2018 12:26:20 -0400
Received: from mail-io0-x234.google.com ([2607:f8b0:4001:c06::234]:36810)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <sh@HIDDEN>) id 1fbTYp-000354-7X
 for bug-grep@HIDDEN; Fri, 06 Jul 2018 12:26:19 -0400
Received: by mail-io0-x234.google.com with SMTP id k3-v6so11350175iog.3
 for <bug-grep@HIDDEN>; Fri, 06 Jul 2018 09:26:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=discovergy-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:from:date:message-id:subject:to;
 bh=gsN4tVk2AbiLxUguUpnC/wUV3Nk6Fj17GCQlGhyz7UM=;
 b=Oo5AHAu+DxPJESB8LNkT4ZWoCgD+9xIzN56qIih5SmKyJAZBx2ItDZK471rvqSQATG
 iHZ3GtYgTv7sG9q6cayKkER4huRFSralDMhid3z6Xc5M80wWx5uFgDCje15arJafbEbl
 oM2QWzvZ7YqHwWsoAIcErxRlVkIRJjM3fYJT1mmiOuZDzVi6tZFEwdMrUL5m+AdQ2GRl
 gkBsCXi8BAriWnlgM51gaV2nc2vovD8w2UDZZudTGESO182VbqEOj0SAwgsj+FJWmEDP
 151g9W4GTtma/7r12patcSiPStmivH9jG7sG3E/VPtYu7MDWsLWCHfXoTMRMj5/xKVo3
 KLog==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=gsN4tVk2AbiLxUguUpnC/wUV3Nk6Fj17GCQlGhyz7UM=;
 b=HvtBTPH52sZoi2sQGyzP7ntKmSvEQOjeMbpD4NaSRE7iJ6hhXHQ4etl4/Q1D5zhpDC
 WZI77+Frwc3Fsv/Ksg8DNBL5aWLE9vHdwK6yGZSu2TYtot/uteKLlbJ+XJRbUCf33TON
 l96BHaI9RaOjTLcU52Eyh9c8rGNOsdHv2ZKvBVHi0/afUhQ9hqy3qsw91qKB5uvC60IP
 BPTvFPymBmt8b3EpvtWMjuK912gRR0J77D8n56qXkBdPaRmwI4pnxBMryZevSdOHCdQ2
 g3Mlo061b2cTRqWFVHogUbhq3VnJep/ANsz4exsR6nNeSR938JYQ5b+s9jzYJgJJhDxx
 Ty1g==
X-Gm-Message-State: APt69E2yZpnkBt2GtBRmO7+j/mh3LkKSf2fImwUwje97cv1Y4fB08RoW
 gNAqs6uT3XG+0apEQQWyxnQKjiSWyxLYhkUwIvCdKYtq
X-Google-Smtp-Source: AAOMgpfQLYAwjGzKanOd0Y03aDYUSvuifqGJP849QmjL8Bxcg6XC1qo1G9LHokuzVfIFD9E2XQcaPh47omloYEXunRk=
X-Received: by 2002:a6b:4e04:: with SMTP id c4-v6mr9029232iob.19.1530894377892; 
 Fri, 06 Jul 2018 09:26:17 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a02:1b98:0:0:0:0:0 with HTTP;
 Fri, 6 Jul 2018 09:26:17 -0700 (PDT)
From: Sergiu Hlihor <sh@HIDDEN>
Date: Fri, 6 Jul 2018 18:26:17 +0200
Message-ID: <CAD-3cdeVqR_pvxSmayD=5tDpi8Cpze_ck64gssgoYvjV98No9g@HIDDEN>
Subject: Improvements in Grep
To: bug-grep@HIDDEN
Content-Type: multipart/alternative; boundary="000000000000954fdf0570571f6a"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Fri, 06 Jul 2018 17:31:47 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

--000000000000954fdf0570571f6a
Content-Type: text/plain; charset="UTF-8"

Hello,
     I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While
grepping over large files I've noticed Grep is painfully slow. The
bottleneck seems to be the read block which is extremely low (looks like
64KB). For large files residing over big HDD RAID arrays, this request
barely reaches one drive and based on CPU usage, grep is idling more or
less. Given my tests for such scenarios, a read block size of at least
512KB would be way more efficient. It's very likely that optimum would be
1MB+. Also, such increase in buffer size would also benefit slightly SSDs
where maximum sequential throughput is usually achieved when reading at
256KB+ block size.
     If this is already possible in newer versions or configurable, I'd
appreciate some hints about the new version which contains or about the way
I can configure it to increase the read block size.

Thanks and best regards,
Sergiu

--000000000000954fdf0570571f6a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello, <br></div><div>=C2=A0=C2=A0=C2=A0=C2=A0 I&#39;=
m using grep over Ubuntu Server 14.04 (Grep version 2.16). While grepping o=
ver large files I&#39;ve noticed Grep is painfully slow. The bottleneck see=
ms to be the read block which is extremely low (looks like 64KB). For large=
 files residing over big HDD RAID arrays, this request barely reaches one d=
rive and based on CPU usage, grep is idling more or less. Given my tests fo=
r such scenarios, a read block size of at least 512KB would be way more eff=
icient. It&#39;s very likely that optimum would be 1MB+. Also, such increas=
e in buffer size would also benefit slightly SSDs where maximum sequential =
throughput is usually achieved when reading at 256KB+ block size. <br></div=
><div>=C2=A0=C2=A0=C2=A0=C2=A0 If this is already possible in newer version=
s or configurable, I&#39;d appreciate some hints about the new version whic=
h contains or about the way I can configure it to increase the read block s=
ize. <br></div><div><br></div><div>Thanks and best regards,</div><div>Sergi=
u</div></div>

--000000000000954fdf0570571f6a--




Acknowledgement sent to Sergiu Hlihor <sh@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-grep@HIDDEN. Full text available.
Report forwarded to bug-grep@HIDDEN:
bug#32073; Package grep. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 2 Jan 2020 01:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.