X-Loop: help-debbugs@HIDDEN Subject: bug#22059: grep -E: unexpected behaviour Resent-From: Charles <c@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Mon, 30 Nov 2015 07:23:02 +0000 Resent-Message-ID: <handler.22059.B.144886815619681 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 22059 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 22059 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-grep@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.144886815619681 (code B ref -1); Mon, 30 Nov 2015 07:23:02 +0000 Received: (at submit) by debbugs.gnu.org; 30 Nov 2015 07:22:36 +0000 Received: from localhost ([127.0.0.1]:60025 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1a3Imv-000572-UF for submit <at> debbugs.gnu.org; Mon, 30 Nov 2015 02:22:36 -0500 Received: from eggs.gnu.org ([208.118.235.92]:55914) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <c@HIDDEN>) id 1a3GXP-0001d7-Js for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXO-0005F1-Ad for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:07 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42777) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXO-0005Ex-7Y for submit <at> debbugs.gnu.org; Sun, 29 Nov 2015 23:58:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37677) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXN-0004dc-Dy for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXJ-0005EY-Dr for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:05 -0500 Received: from smtp5.emailarray.com ([65.39.216.39]:37633) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <c@HIDDEN>) id 1a3GXJ-0005ET-8v for bug-grep@HIDDEN; Sun, 29 Nov 2015 23:58:01 -0500 Received: (qmail 82799 invoked by uid 89); 30 Nov 2015 04:57:58 -0000 Received: from unknown (HELO ?192.168.10.17?) (Y2hhcmxlc0BjaGFybGVzbWF0a2luc29uLm9yZ0A1OS45OS4yMzkuODg=) (POLARISLOCAL) by smtp5.emailarray.com with SMTP; 30 Nov 2015 04:57:58 -0000 Message-ID: <565BD753.7020507@HIDDEN> Date: Mon, 30 Nov 2015 10:27:55 +0530 From: Charles <c@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Mailman-Approved-At: Mon, 30 Nov 2015 02:22:17 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) As expected: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL' Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL' But add the i to the pattern and the behaviour is unexpected: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1 [no output] Apparently grep silently stops processing when it encounters the invalid UTF-8: # grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1 udisksd[2650]: The string `TSSTcorp CDDVDW In case the specific unusual characters are relevant, here they are in hex: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x 0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655 0000020 45ed e8b3 e342 4cc4 0a27 0000032 When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable. Version: 2.20 from the Debian Jessie package 2.20-4.1 Charles
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Charles <c@HIDDEN> Subject: bug#22059: Acknowledgement (grep -E: unexpected behaviour) Message-ID: <handler.22059.B.144886815619681.ack <at> debbugs.gnu.org> References: <565BD753.7020507@HIDDEN> X-Gnu-PR-Message: ack 22059 X-Gnu-PR-Package: grep Reply-To: 22059 <at> debbugs.gnu.org Date: Mon, 30 Nov 2015 07:23:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-grep@HIDDEN If you wish to submit further information on this problem, please send it to 22059 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 22059: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D22059 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#22059: grep -E: unexpected behaviour Resent-From: Paul Eggert <eggert@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Mon, 30 Nov 2015 17:28:02 +0000 Resent-Message-ID: <handler.22059.B22059.144890444924700 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 22059 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Charles <c@HIDDEN>, 22059 <at> debbugs.gnu.org Received: via spool by 22059-submit <at> debbugs.gnu.org id=B22059.144890444924700 (code B ref 22059); Mon, 30 Nov 2015 17:28:02 +0000 Received: (at 22059) by debbugs.gnu.org; 30 Nov 2015 17:27:29 +0000 Received: from localhost ([127.0.0.1]:33094 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1a3SEa-0006QK-Pz for submit <at> debbugs.gnu.org; Mon, 30 Nov 2015 12:27:28 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41764) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eggert@HIDDEN>) id 1a3SEZ-0006QA-5k for 22059 <at> debbugs.gnu.org; Mon, 30 Nov 2015 12:27:27 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 54CAA1601D0; Mon, 30 Nov 2015 09:27:26 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id mxRZTcT7DNDn; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id AA339160E3D; Mon, 30 Nov 2015 09:27:25 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id CKHkCot-qxiy; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 91B181601D0; Mon, 30 Nov 2015 09:27:25 -0800 (PST) References: <565BD753.7020507@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Message-ID: <565C86FD.70909@HIDDEN> Date: Mon, 30 Nov 2015 09:27:25 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <565BD753.7020507@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) On 11/29/2015 08:57 PM, Charles wrote: > Apparently grep silently stops processing when it encounters the invalid UTF-8: The regular expression "." matches a single character, and ".*" matches a string of characters. In your example, there is an encoding error, and encoding errors are not characters so "." and ".*" do not match them. I don't see any bug here. > When the input has invalid characters so grep cannot process it, a message could be expected That's a good suggestion, yes.
Received: (at control) by debbugs.gnu.org; 31 Dec 2015 08:55:23 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Dec 31 03:55:23 2015 Received: from localhost ([127.0.0.1]:50938 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1aEZ11-0005OJ-J5 for submit <at> debbugs.gnu.org; Thu, 31 Dec 2015 03:55:23 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41868) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from <eggert@HIDDEN>) id 1aEZ0z-0005O1-Ht for control <at> debbugs.gnu.org; Thu, 31 Dec 2015 03:55:21 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 2C836160ED6 for <control <at> debbugs.gnu.org>; Thu, 31 Dec 2015 00:55:16 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Dsi6TvFEL1b3 for <control <at> debbugs.gnu.org>; Thu, 31 Dec 2015 00:55:15 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 4FFD0160ED7 for <control <at> debbugs.gnu.org>; Thu, 31 Dec 2015 00:55:15 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id EnJ9ehm4awCT for <control <at> debbugs.gnu.org>; Thu, 31 Dec 2015 00:55:15 -0800 (PST) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3587F160ED6 for <control <at> debbugs.gnu.org>; Thu, 31 Dec 2015 00:55:15 -0800 (PST) To: control <at> debbugs.gnu.org From: Paul Eggert <eggert@HIDDEN> Subject: grep bug maintenance Organization: UCLA Computer Science Department Message-ID: <5684ED73.6060403@HIDDEN> Date: Thu, 31 Dec 2015 00:55:15 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) severity 22059 wishlist severity 21865 wishlist close 22278 close 22279 close 21755 close 21700 tags 21554 wontfix tags 21527 moreinfo
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.