GNU bug report logs - #22059
grep -E: unexpected behaviour

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: grep; Severity: wishlist; Reported by: Charles <c@HIDDEN>; dated Mon, 30 Nov 2015 07:23:02 UTC; Maintainer for grep is bug-grep@HIDDEN.
Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 22059 <at> debbugs.gnu.org:


Received: (at 22059) by debbugs.gnu.org; 30 Nov 2015 17:27:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Nov 30 12:27:29 2015
Received: from localhost ([127.0.0.1]:33094 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1a3SEa-0006QK-Pz
	for submit <at> debbugs.gnu.org; Mon, 30 Nov 2015 12:27:28 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41764)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eggert@HIDDEN>) id 1a3SEZ-0006QA-5k
 for 22059 <at> debbugs.gnu.org; Mon, 30 Nov 2015 12:27:27 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 54CAA1601D0;
 Mon, 30 Nov 2015 09:27:26 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id mxRZTcT7DNDn; Mon, 30 Nov 2015 09:27:25 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id AA339160E3D;
 Mon, 30 Nov 2015 09:27:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id CKHkCot-qxiy; Mon, 30 Nov 2015 09:27:25 -0800 (PST)
Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 91B181601D0;
 Mon, 30 Nov 2015 09:27:25 -0800 (PST)
Subject: Re: bug#22059: grep -E: unexpected behaviour
To: Charles <c@HIDDEN>, 22059 <at> debbugs.gnu.org
References: <565BD753.7020507@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <565C86FD.70909@HIDDEN>
Date: Mon, 30 Nov 2015 09:27:25 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <565BD753.7020507@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 22059
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

On 11/29/2015 08:57 PM, Charles wrote:
> Apparently grep silently stops processing when it encounters the invalid UTF-8:

The regular expression "." matches a single character, and ".*" matches 
a string of characters. In your example, there is an encoding error, and 
encoding errors are not characters so "." and ".*" do not match them. I 
don't see any bug here.

> When the input has invalid characters so grep cannot process it, a message could be expected

That's a good suggestion, yes.




Information forwarded to bug-grep@HIDDEN:
bug#22059; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 30 Nov 2015 07:22:36 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Nov 30 02:22:36 2015
Received: from localhost ([127.0.0.1]:60025 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1a3Imv-000572-UF
	for submit <at> debbugs.gnu.org; Mon, 30 Nov 2015 02:22:36 -0500
Received: from eggs.gnu.org ([208.118.235.92]:55914)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <c@HIDDEN>) id 1a3GXP-0001d7-Js
 for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:08 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <c@HIDDEN>) id 1a3GXO-0005F1-Ad
 for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:07 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:42777)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <c@HIDDEN>) id 1a3GXO-0005Ex-7Y
 for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:06 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37677)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <c@HIDDEN>) id 1a3GXN-0004dc-Dy
 for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:06 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <c@HIDDEN>) id 1a3GXJ-0005EY-Dr
 for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:05 -0500
Received: from smtp5.emailarray.com ([65.39.216.39]:37633)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <c@HIDDEN>) id 1a3GXJ-0005ET-8v
 for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:01 -0500
Received: (qmail 82799 invoked by uid 89); 30 Nov 2015 04:57:58 -0000
Received: from unknown (HELO ?192.168.10.17?)
 (Y2hhcmxlc0BjaGFybGVzbWF0a2luc29uLm9yZ0A1OS45OS4yMzkuODg=) (POLARISLOCAL) 
 by smtp5.emailarray.com with SMTP; 30 Nov 2015 04:57:58 -0000
Message-ID: <565BD753.7020507@HIDDEN>
Date: Mon, 30 Nov 2015 10:27:55 +0530
From: Charles <c@HIDDEN>
MIME-Version: 1.0
To: bug-grep@HIDDEN
Subject: grep -E: unexpected behaviour
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 30 Nov 2015 02:22:17 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

As expected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'

But add the i to the pattern and the behaviour is unexpected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1
[no output]

Apparently grep silently stops processing when it encounters the invalid UTF-8:

# grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1
udisksd[2650]: The string `TSSTcorp CDDVDW

In case the specific unusual characters are relevant, here they are in hex:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x
0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655
0000020 45ed e8b3 e342 4cc4 0a27
0000032

When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable.

Version: 2.20 from the Debian Jessie package 2.20-4.1

Charles





Acknowledgement sent to Charles <c@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-grep@HIDDEN. Full text available.
Report forwarded to bug-grep@HIDDEN:
bug#22059; Package grep. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.