GNU bug report logs - #29668
grep: Fatal problem with (big) file

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: grep; Reported by: pg <pasi.vitsa@HIDDEN>; dated Mon, 11 Dec 2017 22:03:02 UTC; Maintainer for grep is bug-grep@HIDDEN.

Message received at 29668 <at> debbugs.gnu.org:


Received: (at 29668) by debbugs.gnu.org; 2 Jan 2020 08:54:55 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 02 03:54:55 2020
Received: from localhost ([127.0.0.1]:38066 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1imwFr-00034k-3E
	for submit <at> debbugs.gnu.org; Thu, 02 Jan 2020 03:54:55 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:44352)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>)
 id 1imwFo-00034O-JI; Thu, 02 Jan 2020 03:54:53 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 81FB0160017;
 Thu,  2 Jan 2020 00:54:46 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id gcmiEcoIYOZE; Thu,  2 Jan 2020 00:54:45 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id B12C2160054;
 Thu,  2 Jan 2020 00:54:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id cpROGi6XhBC7; Thu,  2 Jan 2020 00:54:45 -0800 (PST)
Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com
 [23.242.74.103])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 89B37160017;
 Thu,  2 Jan 2020 00:54:45 -0800 (PST)
To: Jason Franklin <jrf@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Subject: Re: Possible bug with handling -I option
Organization: UCLA Computer Science Department
Message-ID: <0987d409-b22c-5832-2ecc-bd23401b9cf7@HIDDEN>
Date: Thu, 2 Jan 2020 00:54:42 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.2.2
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 29668
Cc: 33552 <at> debbugs.gnu.org, 29668 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Jason, thanks for reporting this grep bug <https://bugs.gnu.org/33552>. It
strikes me that this is related to another grep bug <https://bugs.gnu.org/29668>
concerning the "Binary files ..." message. Although they're not the same bug,
it's likely that fixing one will also entail fixing the other. So I'll add a
message to both bug reports to this effect.




Information forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.

Message received at 29668 <at> debbugs.gnu.org:


Received: (at 29668) by debbugs.gnu.org; 16 Dec 2017 00:26:11 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Dec 15 19:26:11 2017
Received: from localhost ([127.0.0.1]:36249 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1eQ0It-0006Q9-4A
	for submit <at> debbugs.gnu.org; Fri, 15 Dec 2017 19:26:11 -0500
Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:46706)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1eQ0Iq-0006Pq-Uu
 for 29668 <at> debbugs.gnu.org; Fri, 15 Dec 2017 19:26:09 -0500
Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233])
 by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 8620D88065A
 for <29668 <at> debbugs.gnu.org>; Sat, 16 Dec 2017 09:26:01 +0900 (JST)
X-matriXscan-loop-detect: c148ae16da7cdffb07a81bb9404a5abc6cfff6f5
Received: from mail01.kcn.ne.jp ([61.86.6.180]) by mxs01-s with ESMTP;
 Sat, 16 Dec 2017 09:26:00 +0900 (JST)
Received: from [10.120.1.101] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail01.kcn.ne.jp (Postfix) with ESMTPA id 1A9475A8260;
 Sat, 16 Dec 2017 09:26:00 +0900 (JST)
Date: Sat, 16 Dec 2017 09:25:59 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#29668: grep: Fatal problem with (big) file
In-Reply-To: <b2a76e28-7ec4-00bb-f598-cb8b6ba75519@HIDDEN>
References: <20171214082525.532F.27F6AC2D@HIDDEN>
 <b2a76e28-7ec4-00bb-f598-cb8b6ba75519@HIDDEN>
Message-Id: <20171216092558.1756.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.73 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 29668
Cc: 29668 <at> debbugs.gnu.org, toimitus@HIDDEN, webmaster@HIDDEN,
 pg <pasi.vitsa@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)


On Wed, 13 Dec 2017 16:03:57 -0800
Paul Eggert <eggert@HIDDEN> wrote:

> On 12/13/2017 03:25 PM, Norihiro Tanaka wrote:
> > I don't seem that that's problem.  the user pass output of grep to wc -l,
> > so `Binary file ... matches' line is also counted by `wc' as one line.
> 
> The intent of 'grep PATTERN | wc -l' is to count the number of matches, like 'grep -c PATTERN' would. But it doesn't work that way here. E.g., on Fedora 27 with LANG=en_US.UTF-8:
> 
> $ grep -c Volvo Tieliikenne5.0.csv
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> 241264
> $ grep Volvo Tieliikenne5.0.csv | tail -n 1
> Binary file Tieliikenne5.0.csv matches
> 
> If the "Binary file ... matches" line were sent to stdout instead of to stderr, the problem would be more obvious to the user:
> 
> $ grep -c Volvo Tieliikenne5.0.csv
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> Binary file Tieliikenne5.0.csv matches
> 241264
> $ grep Volvo Tieliikenne5.0.csv | tail -n 1
> Binary file Tieliikenne5.0.csv matches
> T;2017-09-29;75;01;;;19550000;;;;;1;1570;;3000;2595;1670;;01;2200;20.6;4;false;false;Volvo;;;;;01;;01;977;;;841;;5092946
> 
> I believe that in the past I've thought that the "Binary file" message should be sent to stdout, but these examples are a reasonably compelling reason to send them to stderr instead.

In addition, the following problem can also occur.

$ printf 'Binary file a.txt matches\n' >a.txt
$ env LC_ALL=en_US.utf8 grep B a.txt
Binary file a.txt matches

$ printf '\xFFB\n' >a.txt
$ env LC_ALL=en_US.utf8 grep B a.txt
Binary file a.txt matches

Both are same output.  However, the former displays the contents of the
matched line, OTOH the latter is not so.  if "Binary file" is sent to stdout,
a user can not distinguish whether a.txt is text file or a binary file
without opening the file.





Information forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.

Message received at 29668 <at> debbugs.gnu.org:


Received: (at 29668) by debbugs.gnu.org; 14 Dec 2017 00:04:09 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 13 19:04:09 2017
Received: from localhost ([127.0.0.1]:32774 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ePH0S-0006A0-SN
	for submit <at> debbugs.gnu.org; Wed, 13 Dec 2017 19:04:09 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46378)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1ePH0O-00069T-NR
 for 29668 <at> debbugs.gnu.org; Wed, 13 Dec 2017 19:04:05 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id C068C1614AB;
 Wed, 13 Dec 2017 16:03:58 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 0xQa4y-rB3qm; Wed, 13 Dec 2017 16:03:58 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0E2891614BA;
 Wed, 13 Dec 2017 16:03:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id V8UpKXJL2lJ8; Wed, 13 Dec 2017 16:03:57 -0800 (PST)
Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E32651614A9;
 Wed, 13 Dec 2017 16:03:57 -0800 (PST)
Subject: Re: bug#29668: grep: Fatal problem with (big) file
To: Norihiro Tanaka <noritnk@HIDDEN>
References: <20171212083636.F7EE.27F6AC2D@HIDDEN>
 <3b14dc5f-687f-3afb-2fb6-d1ccd7c176ab@HIDDEN>
 <20171214082525.532F.27F6AC2D@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <b2a76e28-7ec4-00bb-f598-cb8b6ba75519@HIDDEN>
Date: Wed, 13 Dec 2017 16:03:57 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <20171214082525.532F.27F6AC2D@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 29668
Cc: 29668 <at> debbugs.gnu.org, toimitus@HIDDEN, webmaster@HIDDEN,
 pg <pasi.vitsa@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

On 12/13/2017 03:25 PM, Norihiro Tanaka wrote:
> I don't seem that that's problem.  the user pass output of grep to wc -l,
> so `Binary file ... matches' line is also counted by `wc' as one line.

The intent of 'grep PATTERN | wc -l' is to count the number of matches, 
like 'grep -c PATTERN' would. But it doesn't work that way here. E.g., 
on Fedora 27 with LANG=en_US.UTF-8:

$ grep -c Volvo Tieliikenne5.0.csv
266175
$ grep Volvo Tieliikenne5.0.csv | wc -l
241264
$ grep Volvo Tieliikenne5.0.csv | tail -n 1
Binary file Tieliikenne5.0.csv matches

If the "Binary file ... matches" line were sent to stdout instead of to 
stderr, the problem would be more obvious to the user:

$ grep -c Volvo Tieliikenne5.0.csv
266175
$ grep Volvo Tieliikenne5.0.csv | wc -l
Binary file Tieliikenne5.0.csv matches
241264
$ grep Volvo Tieliikenne5.0.csv | tail -n 1
Binary file Tieliikenne5.0.csv matches
T;2017-09-29;75;01;;;19550000;;;;;1;1570;;3000;2595;1670;;01;2200;20.6;4;false;false;Volvo;;;;;01;;01;977;;;841;;5092946

I believe that in the past I've thought that the "Binary file" message 
should be sent to stdout, but these examples are a reasonably compelling 
reason to send them to stderr instead.




Information forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.

Message received at 29668 <at> debbugs.gnu.org:


Received: (at 29668) by debbugs.gnu.org; 13 Dec 2017 23:25:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 13 18:25:40 2017
Received: from localhost ([127.0.0.1]:60984 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ePGPD-0003Gt-VZ
	for submit <at> debbugs.gnu.org; Wed, 13 Dec 2017 18:25:40 -0500
Received: from mailgw04.kcn.ne.jp ([61.86.7.211]:60035)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1ePGPC-0003Gf-E2
 for 29668 <at> debbugs.gnu.org; Wed, 13 Dec 2017 18:25:39 -0500
Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234])
 by mailgw04.kcn.ne.jp (Postfix) with ESMTP id 570C6806ED
 for <29668 <at> debbugs.gnu.org>; Thu, 14 Dec 2017 08:25:30 +0900 (JST)
X-matriXscan-loop-detect: 5550e428710aa37f1643ea5a877f65b4aa216568
Received: from mail07.kcn.ne.jp ([61.86.6.186]) by mxs02-s with ESMTP;
 Thu, 14 Dec 2017 08:25:28 +0900 (JST)
Received: from [10.120.1.101] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail07.kcn.ne.jp (Postfix) with ESMTPA id F3174D50098;
 Thu, 14 Dec 2017 08:25:27 +0900 (JST)
Date: Thu, 14 Dec 2017 08:25:26 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#29668: grep: Fatal problem with (big) file
In-Reply-To: <3b14dc5f-687f-3afb-2fb6-d1ccd7c176ab@HIDDEN>
References: <20171212083636.F7EE.27F6AC2D@HIDDEN>
 <3b14dc5f-687f-3afb-2fb6-d1ccd7c176ab@HIDDEN>
Message-Id: <20171214082525.532F.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.73 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 29668
Cc: 29668 <at> debbugs.gnu.org, toimitus@HIDDEN, webmaster@HIDDEN,
 pg <pasi.vitsa@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)


On Tue, 12 Dec 2017 16:28:09 -0800
Paul Eggert <eggert@HIDDEN> wrote:

> On 12/11/2017 03:36 PM, Norihiro Tanaka wrote:
> > Perhaps, characters not to be able to recognize in your locale included
> > in Tieliikenne 5.0.csv and volvot.csv are included.
> 
> Yes, that's the problem. The original 'grep' output ended in "Binary file Tieliikenne5.0.csv matches" but the user didn't see that. Perhaps we should send that diagnostic to stderr as well.

I don't seem that that's problem.  the user pass output of grep to wc -l,
so `Binary file ... matches' line is also counted by `wc' as one line.

$ env LC_ALL=C grep 'Volvo' Tieliikenne\ 5.0.csv | wc -l
266175
$ env LC_ALL=en_US.utf8 grep 'Volvo' Tieliikenne\ 5.0.csv | wc -l
241264
$ env LC_ALL=en_US.utf8 grep 'Volvo' Tieliikenne\ 5.0.csv | tail -1
Binary file Tieliikenne 5.0.csv matches

$ env LC_ALL=C grep N3 volvot.csv | wc -l
17822
$ env LC_ALL=en_US.utf8 grep N3 volvot.csv | wc -l
11741
$ env LC_ALL=en_US.utf8 grep N3 volvot.csv | tail -1
Binary file volvot.csv matches





Information forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.

Message received at 29668 <at> debbugs.gnu.org:


Received: (at 29668) by debbugs.gnu.org; 13 Dec 2017 00:28:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Dec 12 19:28:19 2017
Received: from localhost ([127.0.0.1]:59102 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1eOuuJ-000417-6B
	for submit <at> debbugs.gnu.org; Tue, 12 Dec 2017 19:28:19 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:44870)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1eOuuH-00040s-Bf
 for 29668 <at> debbugs.gnu.org; Tue, 12 Dec 2017 19:28:17 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 53E7516145D;
 Tue, 12 Dec 2017 16:28:10 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 00pbbW8BuJ_U; Tue, 12 Dec 2017 16:28:09 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9782316147B;
 Tue, 12 Dec 2017 16:28:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id 6JgvRZiZ-kof; Tue, 12 Dec 2017 16:28:09 -0800 (PST)
Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 7534116090B;
 Tue, 12 Dec 2017 16:28:09 -0800 (PST)
Subject: Re: bug#29668: grep: Fatal problem with (big) file
To: Norihiro Tanaka <noritnk@HIDDEN>, pg <pasi.vitsa@HIDDEN>
References: <1513028725.3625.20.camel@HIDDEN>
 <20171212083636.F7EE.27F6AC2D@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <3b14dc5f-687f-3afb-2fb6-d1ccd7c176ab@HIDDEN>
Date: Tue, 12 Dec 2017 16:28:09 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <20171212083636.F7EE.27F6AC2D@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 29668
Cc: 29668 <at> debbugs.gnu.org, toimitus@HIDDEN, webmaster@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

On 12/11/2017 03:36 PM, Norihiro Tanaka wrote:
> Perhaps, characters not to be able to recognize in your locale included
> in Tieliikenne 5.0.csv and volvot.csv are included.

Yes, that's the problem. The original 'grep' output ended in "Binary 
file Tieliikenne5.0.csv matches" but the user didn't see that. Perhaps 
we should send that diagnostic to stderr as well.





Information forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.

Message received at 29668 <at> debbugs.gnu.org:


Received: (at 29668) by debbugs.gnu.org; 11 Dec 2017 23:36:52 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Dec 11 18:36:52 2017
Received: from localhost ([127.0.0.1]:57423 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1eOXcy-0001wV-Dv
	for submit <at> debbugs.gnu.org; Mon, 11 Dec 2017 18:36:52 -0500
Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:54107)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1eOXcu-0001w5-C2
 for 29668 <at> debbugs.gnu.org; Mon, 11 Dec 2017 18:36:50 -0500
Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233])
 by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 2391C8805D9
 for <29668 <at> debbugs.gnu.org>; Tue, 12 Dec 2017 08:36:41 +0900 (JST)
X-matriXscan-loop-detect: 04718f115f1cdb93a03979b277bc8af447af9021
Received: from mail07.kcn.ne.jp ([61.86.6.186]) by mxs01-s with ESMTP;
 Tue, 12 Dec 2017 08:36:38 +0900 (JST)
Received: from [10.120.1.101] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail07.kcn.ne.jp (Postfix) with ESMTPA id 8C071D50098;
 Tue, 12 Dec 2017 08:36:38 +0900 (JST)
Date: Tue, 12 Dec 2017 08:36:36 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
To: pg <pasi.vitsa@HIDDEN>
Subject: Re: bug#29668: grep: Fatal problem with (big) file
In-Reply-To: <1513028725.3625.20.camel@HIDDEN>
References: <1513028725.3625.20.camel@HIDDEN>
Message-Id: <20171212083636.F7EE.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.73 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 29668
Cc: 29668 <at> debbugs.gnu.org, toimitus@HIDDEN, webmaster@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)


On Mon, 11 Dec 2017 23:45:25 +0200
pg <pasi.vitsa@HIDDEN> wrote:

> $ awk '/Volvo/' Tieliikenne5.0.csv | wc -l
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> 1638

> $ awk '/N3/' volvot.csv | wc -l
> 17822
> $ grep N3 volvot.csv | wc -l
> 1701

Perhaps, characters not to be able to recognize in your locale included
in Tieliikenne 5.0.csv and volvot.csv are included.  Try below.

--
$ env LC_ALL=C grep 'Volvo' Tieliikenne\ 5.0.csv | wc -l
266175

or

$ grep -a 'Volvo' Tieliikenne\ 5.0.csv | wc -l
266175

--
$ env LC_ALL=C grep N3 volvot.csv | wc -l
17822

or

$ grep -a N3 volvot.csv | wc -l
17822





Information forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 11 Dec 2017 22:02:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Dec 11 17:02:46 2017
Received: from localhost ([127.0.0.1]:57376 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1eOW9q-0007om-QH
	for submit <at> debbugs.gnu.org; Mon, 11 Dec 2017 17:02:46 -0500
Received: from eggs.gnu.org ([208.118.235.92]:52732)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pasi.vitsa@HIDDEN>) id 1eOVtV-0007Li-C8
 for submit <at> debbugs.gnu.org; Mon, 11 Dec 2017 16:45:49 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <pasi.vitsa@HIDDEN>) id 1eOVtO-0006HN-Pk
 for submit <at> debbugs.gnu.org; Mon, 11 Dec 2017 16:45:43 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: **
X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_50,FORGED_YAHOO_RCVD,
 FREEMAIL_FROM,T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:56193)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <pasi.vitsa@HIDDEN>)
 id 1eOVtO-0006Gq-Mc
 for submit <at> debbugs.gnu.org; Mon, 11 Dec 2017 16:45:42 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43692)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <pasi.vitsa@HIDDEN>) id 1eOVtN-0008PV-E5
 for bug-grep@HIDDEN; Mon, 11 Dec 2017 16:45:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <pasi.vitsa@HIDDEN>) id 1eOVtJ-00062h-DH
 for bug-grep@HIDDEN; Mon, 11 Dec 2017 16:45:41 -0500
Received: from sonic307-7.consmr.mail.ir2.yahoo.com ([87.248.110.32]:35900)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <pasi.vitsa@HIDDEN>)
 id 1eOVtJ-0005yN-5G
 for bug-grep@HIDDEN; Mon, 11 Dec 2017 16:45:37 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048;
 t=1513028734; bh=1mJbziOLuqbDCATzlNSYvMzm3QO1eT6W9LqWla34/Ps=;
 h=Subject:From:To:Cc:Date:From:Subject;
 b=LFsJ+O7NCBOG2MP7oJrd1LLPTIqZOzGGZzHkcMiNhv5Qfl6iP1VTCLI1tRYBWPdsYDOUoF/s/AC4W/URSleS8ZStVrWkCkz9J2ikFc8pA9IQ5AjDShII8KHGdAPdYG8bJhnPaop+XI9kwSflE2kQTSeTdl57bcaBJoGESqAluX/p1kCWAbD4kTePN/7mnGg/rR6dVERJs99TcCzp4GqkoA0GV0u84VcnqnG54FXxZidX1ItTr8VJ7nhp1py8I6O9Z8S8xYuqtS0yoCsx06HEURGhQIqIPec4hXVeCq7Q+dZKz5EbCQqPnPm+OWWpa1wYu0KEYQUKJmUvq5cXr2ph6w==
X-YMail-OSG: NzbnboYVM1k8w06TssMf4QqvaNt1fvCzyunnltKWqLsPzehoPD9UxSFj7LwEJBy
 SFxK6G34TgNaK5dO9CnxkDcRzRYCoWcS9_VNHrhyI3UQtBJpXk9pg8y4fxtcHl0FdteMo_SFqatD
 abfCErh6.hp1awQ5LZNERyPduIzRds4TRVlavreU37Tt8OtdgNeKifPKodmxmIchu2B4dWc4pkmT
 pmq7NDekXTgthKEK.MZ9jLKGyoxcNMJyXas0fZJEc0DqJNuv2IeQeXgg_EbWHEoLzLSYHJH6jK8P
 Lueyv18jGct_ui3wMk9GLv9_6f78uEvKCLxGmjS7GDQYaVgmnghnENkDFAabKhWUxhdxLE87vnoU
 PhTD5hUEBGsfIeJzGUw0rgzAdHcDFPtV0I9Z9nwVKOof5SEin9SNbWxV.f.ME_.yazkvF4axjsXR
 CKUbVX5kmfSFbOYhumJyQHf.PR.3MKx.Xxgd1J.r_fRtoX6eZAw.95dfZYoOjVUAyTi99A0S9CXz
 v1XlwiUwFA5ga_Rq5aqnCV8lOiBIsMPslR0ZapdQZHw--
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic307.consmr.mail.ir2.yahoo.com with HTTP; Mon, 11 Dec 2017 21:45:34 +0000
Received: from smtp159.mail.ir2.yahoo.com (EHLO pg-desktop) ([46.228.39.122])
 by smtp409.mail.ir2.yahoo.com (JAMES SMTP Server ) with ESMTPA ID
 1330388737; Mon, 11 Dec 2017 21:45:29 +0000 (UTC)
Message-ID: <1513028725.3625.20.camel@HIDDEN>
Subject: grep: Fatal problem with (big) file
From: pg <pasi.vitsa@HIDDEN>
To: bug-grep@HIDDEN
Date: Mon, 11 Dec 2017 23:45:25 +0200
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -3.3 (---)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 11 Dec 2017 17:02:40 -0500
Cc: toimitus@HIDDEN, webmaster@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hello!

$ awk '/Volvo/' Tieliikenne5.0.csv | wc -l
266175
$ grep Volvo Tieliikenne5.0.csv | wc -l
1638
$ echo $? (after "grep =C2=A0Volvo Tieliikenne5.0.csv" only too)
0
$ ack Volvo Tieliikenne5.0.csv | wc -l
266175

The file contain 5 milj. lines. It is the vehicle DB dump of Finland:
http://trafiopendata.97.fi/opendata/171009_Tieliikenne_5_0.zip

$ uname -a
Linux pg-desktop 4.10.0-40-generic #44~16.04.1-Ubuntu SMP Thu Nov 9
15:37:44 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Fatal error with =E2=80=9Dsmall=E2=80=9D file too:
$ awk '/Volvo/' Tieliikenne5.0.csv > volvot.csv
$ awk '/N3/'=C2=A0=C2=A0volvot.csv | wc -l
17822
$ grep N3 volvot.csv | wc -l
1701
$ wc -l volvot.csv=C2=A0
266175 volvot.csv

BR
pg

PS: Ubuntu webmaster - pls put error rep adr into your system and fwd
msg?
PPS: toimitus - Kyll=C3=A4 m=C3=A4=C3=A4 ennen olen osannut grepata;-)
PPPS: pointer error again? use perl or die!




Acknowledgement sent to pg <pasi.vitsa@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-grep@HIDDEN. Full text available.
Report forwarded to bug-grep@HIDDEN:
bug#29668; Package grep. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 2 Jan 2020 09:00:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.