Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 22059) by debbugs.gnu.org; 30 Nov 2015 17:27:29 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Nov 30 12:27:29 2015 Received: from localhost ([127.0.0.1]:33094 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1a3SEa-0006QK-Pz for submit <at> debbugs.gnu.org; Mon, 30 Nov 2015 12:27:28 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41764) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eggert@HIDDEN>) id 1a3SEZ-0006QA-5k for 22059 <at> debbugs.gnu.org; Mon, 30 Nov 2015 12:27:27 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 54CAA1601D0; Mon, 30 Nov 2015 09:27:26 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id mxRZTcT7DNDn; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id AA339160E3D; Mon, 30 Nov 2015 09:27:25 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id CKHkCot-qxiy; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 91B181601D0; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Subject: Re: bug#22059: grep -E: unexpected behaviour To: Charles <c@HIDDEN>, 22059 <at> debbugs.gnu.org References: <565BD753.7020507@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Message-ID: <565C86FD.70909@HIDDEN> Date: Mon, 30 Nov 2015 09:27:25 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <565BD753.7020507@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22059 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) On 11/29/2015 08:57 PM, Charles wrote: > Apparently grep silently stops processing when it encounters the invalid UTF-8: The regular expression "." matches a single character, and ".*" matches a string of characters. In your example, there is an encoding error, and encoding errors are not characters so "." and ".*" do not match them. I don't see any bug here. > When the input has invalid characters so grep cannot process it, a message could be expected That's a good suggestion, yes.
bug-grep@HIDDEN
:bug#22059
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 30 Nov 2015 07:22:36 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Nov 30 02:22:36 2015 Received: from localhost ([127.0.0.1]:60025 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1a3Imv-000572-UF for submit <at> debbugs.gnu.org; Mon, 30 Nov 2015 02:22:36 -0500 Received: from eggs.gnu.org ([208.118.235.92]:55914) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <c@HIDDEN>) id 1a3GXP-0001d7-Js for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXO-0005F1-Ad for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:07 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42777) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXO-0005Ex-7Y for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37677) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXN-0004dc-Dy for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXJ-0005EY-Dr for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:05 -0500 Received: from smtp5.emailarray.com ([65.39.216.39]:37633) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXJ-0005ET-8v for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:01 -0500 Received: (qmail 82799 invoked by uid 89); 30 Nov 2015 04:57:58 -0000 Received: from unknown (HELO ?192.168.10.17?) (Y2hhcmxlc0BjaGFybGVzbWF0a2luc29uLm9yZ0A1OS45OS4yMzkuODg=) (POLARISLOCAL) by smtp5.emailarray.com with SMTP; 30 Nov 2015 04:57:58 -0000 Message-ID: <565BD753.7020507@HIDDEN> Date: Mon, 30 Nov 2015 10:27:55 +0530 From: Charles <c@HIDDEN> MIME-Version: 1.0 To: bug-grep@HIDDEN Subject: grep -E: unexpected behaviour Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 30 Nov 2015 02:22:17 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) As expected: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL' Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL' But add the i to the pattern and the behaviour is unexpected: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1 [no output] Apparently grep silently stops processing when it encounters the invalid UTF-8: # grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1 udisksd[2650]: The string `TSSTcorp CDDVDW In case the specific unusual characters are relevant, here they are in hex: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x 0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655 0000020 45ed e8b3 e342 4cc4 0a27 0000032 When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable. Version: 2.20 from the Debian Jessie package 2.20-4.1 Charles
Charles <c@HIDDEN>
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#22059
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.