GNU bug report logs - #20657
Traditional range expression not accepted in regex/dfa

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: grep; Severity: wishlist; Reported by: arnold@HIDDEN; dated Tue, 26 May 2015 02:43:02 UTC; Maintainer for grep is bug-grep@HIDDEN.
Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 20657 <at> debbugs.gnu.org:


Received: (at 20657) by debbugs.gnu.org; 26 May 2015 06:53:43 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue May 26 02:53:43 2015
Received: from localhost ([127.0.0.1]:56179 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1Yx8kA-0004wu-N6
	for submit <at> debbugs.gnu.org; Tue, 26 May 2015 02:53:43 -0400
Received: from smtp.cs.ucla.edu ([131.179.128.62]:54271)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eggert@HIDDEN>) id 1Yx8k7-0004wc-DI
 for 20657 <at> debbugs.gnu.org; Tue, 26 May 2015 02:53:40 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
 by smtp.cs.ucla.edu (Postfix) with ESMTP id 07DF6A60004;
 Mon, 25 May 2015 23:53:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
 by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id oGFJCdhYL-x6; Mon, 25 May 2015 23:53:32 -0700 (PDT)
Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net
 [100.32.155.148])
 by smtp.cs.ucla.edu (Postfix) with ESMTPSA id EE01CA60003;
 Mon, 25 May 2015 23:53:31 -0700 (PDT)
Message-ID: <5564186B.90208@HIDDEN>
Date: Mon, 25 May 2015 23:53:31 -0700
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: arnold@HIDDEN, 20657 <at> debbugs.gnu.org
Subject: Re: bug#20657: Traditional range expression not accepted in regex/dfa
References: <201505260242.t4Q2gKwH007024@HIDDEN>
In-Reply-To: <201505260242.t4Q2gKwH007024@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 20657
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

arnold@HIDDEN wrote:

> The bugaboo here is the "---"; it's
> a range expression consisting of minus through minus, and apparently long
> ago was how one got a minus into a bracket expression.

Actually, long ago expressions like '[^0-9-]' worked just as they do now, and it 
wasn't ever necessary to use trailing "---".  That being said, it is true that 
in 7th Edition Unix '[^0-9---]' meant the same thing as '[^0-9-]', so in that 
sense we have an incompatibility with 7th Edition Unix here.

> 	$ ./src/grep '[^0-9---]' /dev/null
> 	./src/grep: Invalid range end
>
> The underlying regex and, I believe, dfa routines don't accept this.

Yes, that's correct.  It's not a bug, though, as the regexp is ambiguous and 
does not conform to POSIX, which says the following about RE bracket 
expressions: "To use a <hyphen> as the starting range point, it shall either 
come first in the bracket expression or be specified as a collating symbol; for 
example, "[][.-.]-0]", which matches either a <right-square-bracket> or any 
character or collating element that collates between <hyphen> and 0, inclusive." 
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05> 
In your correspondent's example, the hyphen is a starting range point but is 
neither first in the bracket expression nor is specified as a collating symbol, 
so the regexp doesn't conform to POSIX.

Even though it's not a bug I suppose it wouldn't hurt to make the GNU matchers 
compatible with 7th Edition Unix here, if someone really wants to take that task 
on; it's not urgent, though.




Information forwarded to bug-grep@HIDDEN:
bug#20657; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 26 May 2015 02:42:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 25 22:42:39 2015
Received: from localhost ([127.0.0.1]:56063 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1Yx4pD-0004VI-0t
	for submit <at> debbugs.gnu.org; Mon, 25 May 2015 22:42:39 -0400
Received: from eggs.gnu.org ([208.118.235.92]:52214)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <arnold@HIDDEN>) id 1Yx4pA-0004V4-Rl
 for submit <at> debbugs.gnu.org; Mon, 25 May 2015 22:42:37 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <arnold@HIDDEN>) id 1Yx4p4-0002lc-LN
 for submit <at> debbugs.gnu.org; Mon, 25 May 2015 22:42:31 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:58845)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <arnold@HIDDEN>) id 1Yx4p4-0002lY-IB
 for submit <at> debbugs.gnu.org; Mon, 25 May 2015 22:42:30 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37482)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <arnold@HIDDEN>) id 1Yx4p3-0001bU-LW
 for bug-grep@HIDDEN; Mon, 25 May 2015 22:42:30 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <arnold@HIDDEN>) id 1Yx4oz-0002kz-Lh
 for bug-grep@HIDDEN; Mon, 25 May 2015 22:42:29 -0400
Received: from [96.88.95.60] (port=57223 helo=freefriends.org)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <arnold@HIDDEN>) id 1Yx4oz-0002kk-Et
 for bug-grep@HIDDEN; Mon, 25 May 2015 22:42:25 -0400
X-Envelope-From: arnold@HIDDEN
X-Envelope-To: <bug-grep@HIDDEN>
Received: from freefriends.org (localhost [127.0.0.1])
 by freefriends.org (8.14.9/8.14.9) with ESMTP id t4Q2gKUt007025
 for <bug-grep@HIDDEN>; Mon, 25 May 2015 20:42:21 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.9/8.14.9/submit) id t4Q2gKwH007024
 for bug-grep@HIDDEN; Tue, 26 May 2015 02:42:20 GMT
From: arnold@HIDDEN
Message-Id: <201505260242.t4Q2gKwH007024@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Tue, 26 May 2015 05:42:19 +0300
To: bug-grep@HIDDEN
Subject: Traditional range expression not accepted in regex/dfa
User-Agent: Heirloom mailx 12.5 6/20/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Hi.

I received a bug report for gawk by private email that a regexp of
this form: '[^0-9---]' wasn't accepted.  The bugaboo here is the "---"; it's
a range expression consisting of minus through minus, and apparently long
ago was how one got a minus into a bracket expression.

This can be seen in current grep also:

	$ ./src/grep --version
	./src/grep (GNU grep) 2.21
	Copyright (C) 2014 Free Software Foundation, Inc.
	...

	$ ./src/grep '[^0-9---]' /dev/null
	./src/grep: Invalid range end

The underlying regex and, I believe, dfa routines don't accept this.
Fixing either of them is beyond my skill range, so I thought I'd
pass this one upstream to you folks.

Thanks!

Arnold




Acknowledgement sent to arnold@HIDDEN:
New bug report received and forwarded. Copy sent to bug-grep@HIDDEN. Full text available.
Report forwarded to bug-grep@HIDDEN:
bug#20657; Package grep. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.