GNU bug report logs - #62483
echo a | grep -E -w '((()|a)|())*' # does not terminate

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: grep; Reported by: Koen Claessen <koen@HIDDEN>; dated Mon, 27 Mar 2023 13:15:05 UTC; Maintainer for grep is bug-grep@HIDDEN.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 18:29:50 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 14:29:50 2023
Received: from localhost ([127.0.0.1]:42640 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pj2Sc-0004O8-7q
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:50 -0400
Received: from mail.cs.ucla.edu ([131.179.128.66]:39294)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1pj2SX-0004Ns-KW
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:48 -0400
Received: from localhost (localhost [127.0.0.1])
 by mail.cs.ucla.edu (Postfix) with ESMTP id AEBCE3C09FA02;
 Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Received: from mail.cs.ucla.edu ([127.0.0.1])
 by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id oyIQv3a3PBiJ; Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
 by mail.cs.ucla.edu (Postfix) with ESMTP id 752483C09FA01;
 Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 752483C09FA01
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu;
 s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1680460179;
 bh=2f3sGrU+5qn62ukHicLKudSRcxmJsffhDbks/B5+0OI=;
 h=Message-ID:Date:MIME-Version:To:From;
 b=mnBN4wAUHPlu43V8zbNB9T9pxfGf6HyO7gPzgnJgbfqqBXvW/gp9J17F6/nI3rVGT
 +rrkdp/1IUPB5eBvCcQVeBn2RpKvqC13iaomLjQ4AVagHIh4UuiftUMSw2cWamRB6W
 Y2dQr4gZNvJvDgfPnhopgZd8M4Vmb2Ndoa3j7c9n6TEAp74gomzh9DqX43neTH8sVs
 NlK6CuUZbnfuu3E69cBE1NrvW6XQqr9vz0xfj5qILzwTiilGWONSjPfOer4tt5nzRl
 ZLb3+e1lUwDk6I/8kVFh3q3CE7Py8BCBNZuXWPyDbQPObf+WY4giKoTmeImHUqXHFh
 23i2Aa5w9ntag==
X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu
Received: from mail.cs.ucla.edu ([127.0.0.1])
 by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id xT4zfD34fpKg; Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com
 [172.91.119.151])
 by mail.cs.ucla.edu (Postfix) with ESMTPSA id 4A99F3C097AFE;
 Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Message-ID: <576a26bb-b195-dd33-a5a4-ba38115889d5@HIDDEN>
Date: Sun, 2 Apr 2023 11:29:39 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.0
Content-Language: en-US
To: arnold@HIDDEN, dimitry@HIDDEN
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.1 (-)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.1 (--)

On 2023-04-01 23:52, arnold@HIDDEN wrote:
> It's interesting, as gawk uses the same regex, but with different flags.

Also, GNU grep -w passes the following more-complicated regexp to dfaparse:

   (^|[^[:alnum:]_])(((()|a)|())*)([^[:alnum:]_]|$)

and quite possibly the bug is related to this extra complexity.




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 10:04:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 06:04:12 2023
Received: from localhost ([127.0.0.1]:39237 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piuZI-0000fR-43
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:12 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:59844
 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1piuZG-0000fJ-Gp
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:11 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 04:03:57 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103;
 Sun, 2 Apr 2023 04:03:56 -0600
From: arnold@HIDDEN
Message-Id: <202304021003.332A3uGT023103@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 04:03:56 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
 <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

OK, thanks!

Dimitry Andric <dimitry@HIDDEN> wrote:

> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
> > 
> > Hi.
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> > 
> > I assume you meant --with-included-regex?
> > 
> > If you really used --without-included-regex, that doesn't prove anything... :-)
> > 
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> >> 
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432          if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> 
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >> 
> >> -Dimitry
> >> 
> >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >>> 
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>> 
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>> 
> >>> Arnold
> >>> 
> >>> Dimitry Andric <dimitry@HIDDEN> wrote:
> >>> 
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>>> 
> >>>>> Running the command:
> >>>>> 
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>> 
> >>>>> does not terminate, and uses a LOT of processor time, for all versions of
> >>>>> grep I have tried.
> >>>>> 
> >>>>> This is the smallest case that could be found; simplifying anything in the
> >>>>> input and/or expression leads to correct behavior.
> >>>> 
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>> 
> >>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >>>> 
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>>> 
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>>> 
> >>>> 1378  static reg_errcode_t
> >>>> 1379  __attribute_warn_unused_result__
> >>>> 1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382  {
> >>>> ...
> >>>> 1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416      {
> >>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>>> 1418
> >>>> 1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >>>> ...
> >>>> 1442        /* Proceed to next node.  */
> >>>> 1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>>> 1444                                      &idx, cur_node,
> >>>> 1445                                      &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447        if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465          }
> >>>> 1466      }
> >>>> 
> >>>> -Dimitry
> >>>> 
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>>> 
> >> 
>




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 10:04:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 06:04:18 2023
Received: from localhost ([127.0.0.1]:39240 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piuZN-0000fk-QN
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:18 -0400
Received: from lists.gnu.org ([209.51.188.17]:39736)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1piuZL-0000fb-ID
 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:16 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZK-000719-JP
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400
Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZI-0002di-Og
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 04:03:57 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103;
 Sun, 2 Apr 2023 04:03:56 -0600
From: arnold@HIDDEN
Message-Id: <202304021003.332A3uGT023103@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 04:03:56 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
 <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN;
 helo=freefriends.org
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001,
 KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001,
 SPF_NONE=0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

OK, thanks!

Dimitry Andric <dimitry@HIDDEN> wrote:

> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
> > 
> > Hi.
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> > 
> > I assume you meant --with-included-regex?
> > 
> > If you really used --without-included-regex, that doesn't prove anything... :-)
> > 
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> >> 
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432          if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> 
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >> 
> >> -Dimitry
> >> 
> >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >>> 
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>> 
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>> 
> >>> Arnold
> >>> 
> >>> Dimitry Andric <dimitry@HIDDEN> wrote:
> >>> 
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>>> 
> >>>>> Running the command:
> >>>>> 
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>> 
> >>>>> does not terminate, and uses a LOT of processor time, for all versions of
> >>>>> grep I have tried.
> >>>>> 
> >>>>> This is the smallest case that could be found; simplifying anything in the
> >>>>> input and/or expression leads to correct behavior.
> >>>> 
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>> 
> >>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >>>> 
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>>> 
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>>> 
> >>>> 1378  static reg_errcode_t
> >>>> 1379  __attribute_warn_unused_result__
> >>>> 1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382  {
> >>>> ...
> >>>> 1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416      {
> >>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>>> 1418
> >>>> 1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >>>> ...
> >>>> 1442        /* Proceed to next node.  */
> >>>> 1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>>> 1444                                      &idx, cur_node,
> >>>> 1445                                      &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447        if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465          }
> >>>> 1466      }
> >>>> 
> >>>> -Dimitry
> >>>> 
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>>> 
> >> 
>




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 09:34:02 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 05:34:02 2023
Received: from localhost ([127.0.0.1]:39227 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piu65-0008AY-Hx
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:02 -0400
Received: from lists.gnu.org ([209.51.188.17]:45166)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piu64-0008AQ-1g
 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:00 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piu62-0002Ka-6N
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:59 -0400
Received: from tensor.andric.com ([87.251.56.140])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piu5y-00074R-0e
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:55 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235;
 Sun,  2 Apr 2023 11:33:48 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp
 Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP
 FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD
 xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H
 vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr
 OWfWv7UbnXYdg==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Date: Sun, 2 Apr 2023 11:33:42 +0200
Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN;
 helo=tensor.andric.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS=3D"-O0 -g".

In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.

-Dimitry

> On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
>=20
> Hi.
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>=20
> I assume you meant --with-included-regex?
>=20
> If you really used --without-included-regex, that doesn't prove =
anything... :-)
>=20
> It's interesting, as gawk uses the same regex, but with different =
flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>=20
> Thanks,
>=20
> Arnold
>=20
>>=20
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>> (gdb)
>> 1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1: idx =3D 0
>> (gdb)
>> 1432          if (__glibc_unlikely (cur_node < 0))
>> 1: idx =3D 0
>> (gdb)
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>>=20
>> The endless loop looks the same. grep's regexec.c is mostly the same =
as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>=20
>> -Dimitry
>>=20
>>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>>>=20
>>> This does not reproduce with gawk, even when I force use of the =
regex
>>> matcher.
>>>=20
>>> What happens if grep is built with the included regex files instead =
of
>>> relying on the ones in the local glibc?
>>>=20
>>> Arnold
>>>=20
>>> Dimitry Andric <dimitry@HIDDEN> wrote:
>>>=20
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>>>=20
>>>>> Running the command:
>>>>>=20
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>=20
>>>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>>>> grep I have tried.
>>>>>=20
>>>>> This is the smallest case that could be found; simplifying =
anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>=20
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>=20
>>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     =
TIME+ COMMAND
>>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   =
0:08.32 grep -E -w ((()|a)|())*
>>>>=20
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>>>=20
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>>>=20
>>>> 1378  static reg_errcode_t
>>>> 1379  __attribute_warn_unused_result__
>>>> 1380  set_regs (const regex_t *preg, const re_match_context_t =
*mctx, size_t nmatch,
>>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382  {
>>>> ...
>>>> 1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>>> 1416      {
>>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>>>> 1418
>>>> 1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>>>> ...
>>>> 1442        /* Proceed to next node.  */
>>>> 1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>>> 1444                                      &idx, cur_node,
>>>> 1445                                      &eps_via_nodes, fs);
>>>> 1446
>>>> 1447        if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465          }
>>>> 1466      }
>>>>=20
>>>> -Dimitry
>>>>=20
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>>>=20
>>=20


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW
o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA=
=F/9f
-----END PGP SIGNATURE-----

--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 09:33:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 05:33:54 2023
Received: from localhost ([127.0.0.1]:39224 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piu5x-0008AC-Vd
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:54 -0400
Received: from tensor.andric.com ([87.251.56.140]:53462 ident=postfix)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piu5u-0008A0-Gk
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:52 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235;
 Sun,  2 Apr 2023 11:33:48 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp
 Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP
 FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD
 xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H
 vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr
 OWfWv7UbnXYdg==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Date: Sun, 2 Apr 2023 11:33:42 +0200
Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS=3D"-O0 -g".

In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.

-Dimitry

> On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
>=20
> Hi.
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>=20
> I assume you meant --with-included-regex?
>=20
> If you really used --without-included-regex, that doesn't prove =
anything... :-)
>=20
> It's interesting, as gawk uses the same regex, but with different =
flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>=20
> Thanks,
>=20
> Arnold
>=20
>>=20
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>> (gdb)
>> 1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1: idx =3D 0
>> (gdb)
>> 1432          if (__glibc_unlikely (cur_node < 0))
>> 1: idx =3D 0
>> (gdb)
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>>=20
>> The endless loop looks the same. grep's regexec.c is mostly the same =
as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>=20
>> -Dimitry
>>=20
>>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>>>=20
>>> This does not reproduce with gawk, even when I force use of the =
regex
>>> matcher.
>>>=20
>>> What happens if grep is built with the included regex files instead =
of
>>> relying on the ones in the local glibc?
>>>=20
>>> Arnold
>>>=20
>>> Dimitry Andric <dimitry@HIDDEN> wrote:
>>>=20
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>>>=20
>>>>> Running the command:
>>>>>=20
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>=20
>>>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>>>> grep I have tried.
>>>>>=20
>>>>> This is the smallest case that could be found; simplifying =
anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>=20
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>=20
>>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     =
TIME+ COMMAND
>>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   =
0:08.32 grep -E -w ((()|a)|())*
>>>>=20
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>>>=20
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>>>=20
>>>> 1378  static reg_errcode_t
>>>> 1379  __attribute_warn_unused_result__
>>>> 1380  set_regs (const regex_t *preg, const re_match_context_t =
*mctx, size_t nmatch,
>>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382  {
>>>> ...
>>>> 1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>>> 1416      {
>>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>>>> 1418
>>>> 1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>>>> ...
>>>> 1442        /* Proceed to next node.  */
>>>> 1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>>> 1444                                      &idx, cur_node,
>>>> 1445                                      &eps_via_nodes, fs);
>>>> 1446
>>>> 1447        if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465          }
>>>> 1466      }
>>>>=20
>>>> -Dimitry
>>>>=20
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>>>=20
>>=20


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW
o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA=
=F/9f
-----END PGP SIGNATURE-----

--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 06:52:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 02:52:22 2023
Received: from localhost ([127.0.0.1]:39032 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pirZe-0003Td-3X
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:22 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:57092
 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1pirZb-0003TU-Kl
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:20 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 00:52:04 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466;
 Sun, 2 Apr 2023 00:52:02 -0600
From: arnold@HIDDEN
Message-Id: <202304020652.3326q2Ut024466@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 00:52:02 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Hi.

Dimitry Andric <dimitry@HIDDEN> wrote:

> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:

I assume you meant --with-included-regex?

If you really used --without-included-regex, that doesn't prove anything... :-)

It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.

Thanks,

Arnold

>
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432          if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> > 
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> > 
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> > 
> > Arnold
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>> 
> >>> Running the command:
> >>> 
> >>> echo a | grep -E -w '((()|a)|())*'
> >>> 
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>> 
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >> 
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >> 
> >>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >> 
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >> 
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >> 
> >>  1378  static reg_errcode_t
> >>  1379  __attribute_warn_unused_result__
> >>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>  1381            regmatch_t *pmatch, bool fl_backtrack)
> >>  1382  {
> >> ...
> >>  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>  1416      {
> >>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>  1418
> >>  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >>  1442        /* Proceed to next node.  */
> >>  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>  1444                                      &idx, cur_node,
> >>  1445                                      &eps_via_nodes, fs);
> >>  1446
> >>  1447        if (__glibc_unlikely (cur_node < 0))
> >> ...
> >>  1465          }
> >>  1466      }
> >> 
> >> -Dimitry
> >> 
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >> 
>




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 06:52:31 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 02:52:31 2023
Received: from localhost ([127.0.0.1]:39035 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pirZm-0003Tz-R9
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:31 -0400
Received: from lists.gnu.org ([209.51.188.17]:50340)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1pirZk-0003To-83
 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:28 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZh-0008FE-Kz
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400
Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZf-0006yO-Kg
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 00:52:04 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466;
 Sun, 2 Apr 2023 00:52:02 -0600
From: arnold@HIDDEN
Message-Id: <202304020652.3326q2Ut024466@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 00:52:02 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN;
 helo=freefriends.org
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001,
 KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001,
 SPF_NONE=0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi.

Dimitry Andric <dimitry@HIDDEN> wrote:

> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:

I assume you meant --with-included-regex?

If you really used --without-included-regex, that doesn't prove anything... :-)

It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.

Thanks,

Arnold

>
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432          if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> > 
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> > 
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> > 
> > Arnold
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>> 
> >>> Running the command:
> >>> 
> >>> echo a | grep -E -w '((()|a)|())*'
> >>> 
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>> 
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >> 
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >> 
> >>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >> 
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >> 
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >> 
> >>  1378  static reg_errcode_t
> >>  1379  __attribute_warn_unused_result__
> >>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>  1381            regmatch_t *pmatch, bool fl_backtrack)
> >>  1382  {
> >> ...
> >>  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>  1416      {
> >>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>  1418
> >>  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >>  1442        /* Proceed to next node.  */
> >>  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>  1444                                      &idx, cur_node,
> >>  1445                                      &eps_via_nodes, fs);
> >>  1446
> >>  1447        if (__glibc_unlikely (cur_node < 0))
> >> ...
> >>  1465          }
> >>  1466      }
> >> 
> >> -Dimitry
> >> 
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >> 
>




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 04:15:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 00:15:29 2023
Received: from localhost ([127.0.0.1]:38815 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pip7p-00072N-2s
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:29 -0400
Received: from mail-lj1-f177.google.com ([209.85.208.177]:41554)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1pip7m-000728-2d
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:27 -0400
Received: by mail-lj1-f177.google.com with SMTP id bx10so8705350ljb.8
 for <62483 <at> debbugs.gnu.org>; Sat, 01 Apr 2023 21:15:26 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1680408920;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=DodkkdU3opFDpkTR7I38Cd8sH5Ml6OzzLr47sEPUB+w=;
 b=xz/uM78Xeo+KZkaGtVa1P9mQjfLV1aivcei2WEfZNoOFMTOkBwpKonRG/ba1JbNr83
 pYR+P1Z7IOyoWVpnhev0bwAe6bG4XFDHsU+cW10ImcfNNcmiK1ArWvyofz13j+DNKQ5n
 /vWdkbrQkfucGebgx/sLK7Jdh8OA0tZt1gbZNJnPjeTPcopGYPDCigA2gGQoEBVawv7g
 4x6kAG7avDpCkKtc8U9mnvqk+UI472MyheOtQNwoE3M/v2Hv8m9zsDYT6bHBxxGdDYle
 4lcLUHMuIgzGnfkj3/hevOpflvPGZxjPyKTr8WEjYK3ZapFlvI9VQTWpnbJK5hO6Xz77
 hAxw==
X-Gm-Message-State: AAQBX9e7GYXZxDzQFlrGuxweB9jVGPGIK9j3o02zJXm7N/ETVFXjGl10
 mYjRz0gfA7+RtfFFOJlo5APyBjtu4o0gOz8KPHQ=
X-Google-Smtp-Source: AKy350ZblXwqXc5nXQDfPymdw4nPR2IN4imJcGePAqD1vZp9OXVu/8aqbvMVAAgh/JkXmi0+y0PDQvbteeLWQRYc1Jg=
X-Received: by 2002:a2e:95d8:0:b0:29a:9053:ed1b with SMTP id
 y24-20020a2e95d8000000b0029a9053ed1bmr9738568ljh.3.1680408919857; Sat, 01 Apr
 2023 21:15:19 -0700 (PDT)
MIME-Version: 1.0
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Sat, 1 Apr 2023 21:15:07 -0700
Message-ID: <CA+8g5KHc9W4dBJufWv96wd53ZE4fcYxrynuUEh9vNR-_1LFYbQ@HIDDEN>
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
To: Koen Claessen <koen@HIDDEN>
Content-Type: multipart/mixed; boundary="000000000000bfb5b205f852b0a6"
X-Spam-Score: 0.2 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.8 (/)

--000000000000bfb5b205f852b0a6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Mar 27, 2023 at 6:15=E2=80=AFAM Koen Claessen <koen@HIDDEN> wr=
ote:
> Running the command:
>
>   echo a | grep -E -w '((()|a)|())*'
>
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
>
> This is the smallest case that could be found; simplifying anything in th=
e
> input and/or expression leads to correct behavior.

Thank you! How did you find that?

FYI, this strikes grep-3.10 (on Fedora 37/glibc-2.36-9.fc37.x86_64)
when using LC_ALL=3Den_US.UTF-8, but not with LC_ALL=3DC.
I.e., this infloops:
   echo a | LC_ALL=3Den_US.UTF-8 grep -E -w '((()|a)|())*'

but this works as expected and promptly prints its line of input:
     echo a | LC_ALL=3DC grep -E -w '((()|a)|())*'

For now, I've added an expected-failing test case for this bug:

--000000000000bfb5b205f852b0a6
Content-Type: application/octet-stream; name="grep-glibc-infloop.patch"
Content-Disposition: attachment; filename="grep-glibc-infloop.patch"
Content-Transfer-Encoding: base64
Content-ID: <f_lfyw0lqb0>
X-Attachment-Id: f_lfyw0lqb0

RnJvbSBlNDQ4ZTg2YjVjMDFjMGE5OGIxNmU4NGQ2MWVkZTc5NDAxNzBiYjViIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog
U2F0LCAxIEFwciAyMDIzIDIxOjAxOjAxIC0wNzAwClN1YmplY3Q6IFtQQVRDSF0gdGVzdHM6IGFk
ZCBhIGtub3duLWZhaWxpbmcgZ2xpYmMtaW5mbG9vcCB0ZXN0CgoqIHRlc3RzL2dsaWJjLWluZmxv
b3A6IE5ldyBmaWxlLgpCYXNlZCBvbiB0aGUgY29tbWFuZCBmcm9tIEtvZW4gQ2xhZXNzZW4KcmVw
b3J0ZWQgaW4gaHR0cHM6Ly9idWdzLmdudS5vcmcvNjI0ODMKKiB0ZXN0cy9NYWtlZmlsZS5hbSAo
VEVTVFMpOiBBZGQgdGhlIGZpbGUgbmFtZQoqIFRIQU5LUy5pbjogQWRkIG5hbWUgb2YgcmVwb3J0
ZXIuCi0tLQogVEhBTktTLmluICAgICAgICAgICB8ICAxICsKIHRlc3RzL01ha2VmaWxlLmFtICAg
fCAgMyArKysKIHRlc3RzL2dsaWJjLWluZmxvb3AgfCAxMyArKysrKysrKysrKysrCiAzIGZpbGVz
IGNoYW5nZWQsIDE3IGluc2VydGlvbnMoKykKIGNyZWF0ZSBtb2RlIDEwMDc1NSB0ZXN0cy9nbGli
Yy1pbmZsb29wCgpkaWZmIC0tZ2l0IGEvVEhBTktTLmluIGIvVEhBTktTLmluCmluZGV4IGZkMzYx
OTIuLjA2ZjhjNTQgMTAwNjQ0Ci0tLSBhL1RIQU5LUy5pbgorKysgYi9USEFOS1MuaW4KQEAgLTU2
LDYgKzU2LDcgQEAgS2FybCBQZXR0ZXJzc29uICAgICAgICAgICAgICAgICAgICAga2FybC5wZXR0
ZXJzc29uQGtscG4uc2UKIEthdmVoIFIuIEdoYXppICAgICAgICAgICAgICAgICAgICAgIGdoYXpp
QGNhaXAucnV0Z2Vycy5lZHUKIEthenVybyBGdXJ1a2F3YSAgICAgICAgICAgICAgICAgICAgIGZ1
cnVrYXdhQGFwcmljb3Qua2VrLmpwCiBLZWl0aCBCb3N0aWMgICAgICAgICAgICAgICAgICAgICAg
ICBib3N0aWNAYnNkaS5jb20KK0tvZW4gQ2xhZXNzZW4gICAgICAgICAgICAgICAgICAgICAgIGtv
ZW5AY2hhbG1lcnMuc2UKIEtyaXNobmEgU2V0aHVyYW1hbiAgICAgICAgICAgICAgICAgIGtyaXNo
bmFAc2dpaHViLmNvcnAuc2dpLmNvbQogS3VydCBEIFNjaHdlaHIgICAgICAgICAgICAgICAgICAg
ICAga2RzY2h3ZWhAaW5zY2kxNC51Y3NkLmVkdQogTHVkb3ZpYyBDb3VydMOocyAgICAgICAgICAg
ICAgICAgICAgIGx1ZG9AZ251Lm9yZwpkaWZmIC0tZ2l0IGEvdGVzdHMvTWFrZWZpbGUuYW0gYi90
ZXN0cy9NYWtlZmlsZS5hbQppbmRleCBmMTk1YzhkLi4wYzNhZGJlIDEwMDY0NAotLS0gYS90ZXN0
cy9NYWtlZmlsZS5hbQorKysgYi90ZXN0cy9NYWtlZmlsZS5hbQpAQCAtNDksNiArNDksOCBAQCBM
REFERCA9IC4uL2xpYi9saWJncmVwdXRpbHMuYSAkKExJQklOVEwpIC4uL2xpYi9saWJncmVwdXRp
bHMuYQogIyBGSVhNRS0yMDE1OiBSZW1vdmUgdGhpcyBvbmNlIHRoZSBnbGliYyBhbmQgZ251bGli
IGJ1Z3MgYXJlIGZpeGVkLgogWEZBSUxfVEVTVFMgPSB0cmlwbGUtYmFja3JlZgoKK1hGQUlMX1RF
U1RTICs9IGdsaWJjLWluZmxvb3AKKwogIyBFcXVpdmFsZW5jZSBjbGFzc2VzIGFyZSBvbmx5IHN1
cHBvcnRlZCB3aGVuIHVzaW5nIHRoZSBzeXN0ZW0KICMgbWF0Y2hlciAod2hpY2ggbWVhbnMgb25s
eSB3aXRoIGdsaWJjKS4KICMgVGhlIGluY2x1ZGVkIG1hdGNoZXIgbmVlZHMgdG8gYmUgZml4ZWQu
CkBAIC0xMDgsNiArMTEwLDcgQEAgVEVTVFMgPQkJCQkJCVwKICAgZmlsbGJ1Zi1sb25nLWxpbmUJ
CQkJXAogICBmbWJ0ZXN0CQkJCQlcCiAgIGZvYWQxCQkJCQkJXAorICBnbGliYy1pbmZsb29wCQkJ
CQlcCiAgIGdyZXAtZGV2LW51bGwJCQkJCVwKICAgZ3JlcC1kZXYtbnVsbC1vdXQJCQkJXAogICBn
cmVwLWRpcgkJCQkJXApkaWZmIC0tZ2l0IGEvdGVzdHMvZ2xpYmMtaW5mbG9vcCBiL3Rlc3RzL2ds
aWJjLWluZmxvb3AKbmV3IGZpbGUgbW9kZSAxMDA3NTUKaW5kZXggMDAwMDAwMC4uNjI0NGI2Ygot
LS0gL2Rldi9udWxsCisrKyBiL3Rlc3RzL2dsaWJjLWluZmxvb3AKQEAgLTAsMCArMSwxMyBAQAor
IyEvYmluL3NoCisjIFRoaXMgd291bGQgaW5mbG9vcCB3aGVuIHVzaW5nIGdsaWJjJ3MgcmVnZXgg
YXQgbGVhc3QgdW50aWwgZ2xpYmMtMi4zNi4KKy4gIiR7c3JjZGlyPS59L2luaXQuc2giOyBwYXRo
X3ByZXBlbmRfIC4uL3NyYworCityZXF1aXJlX3RpbWVvdXRfCityZXF1aXJlX2VuX3V0ZjhfbG9j
YWxlXworCitmYWlsPTAKKworZWNobyBhID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCityZXR1
cm5zXyAwIHRpbWVvdXQgMiBlbnYgTENfQUxMPWVuX1VTLlVURi04IGdyZXAgLUUgLXcgJygoKCl8
YSl8KCkpKicgaW4gfHwgZmFpbD0xCisKK0V4aXQgJGZhaWwKLS0gCjIuNDAuMC4xNjYuZzE0MGI5
NDc4ZGEKCg==
--000000000000bfb5b205f852b0a6--




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 31 Mar 2023 15:25:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 31 11:25:45 2023
Received: from localhost ([127.0.0.1]:34254 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piGdM-0007El-K8
	for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:45 -0400
Received: from tensor.andric.com ([87.251.56.140]:64446 ident=postfix)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piGdI-0007EW-7c
 for 62483 <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:42 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 559306059D;
 Fri, 31 Mar 2023 17:25:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj
 aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R
 p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA
 cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43
 /NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42
 JgFZiA0j/f23g==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202303280646.32S6krUr030677@HIDDEN>
Date: Fri, 31 Mar 2023 17:25:21 +0200
Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Yes, it still reproduces when I configure the latest grep using
--without-included-regex:

1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
(gdb)
1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1: idx =3D 0
(gdb)
1432          if (__glibc_unlikely (cur_node < 0))
1: idx =3D 0
(gdb)
1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0

The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.

-Dimitry

> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>=20
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
>=20
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
>=20
> Arnold
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>=20
>>> Running the command:
>>>=20
>>> echo a | grep -E -w '((()|a)|())*'
>>>=20
>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>> grep I have tried.
>>>=20
>>> This is the smallest case that could be found; simplifying anything =
in the
>>> input and/or expression leads to correct behavior.
>>=20
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>=20
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ =
COMMAND
>>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 =
grep -E -w ((()|a)|())*
>>=20
>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>=20
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>=20
>>  1378  static reg_errcode_t
>>  1379  __attribute_warn_unused_result__
>>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
>>  1381            regmatch_t *pmatch, bool fl_backtrack)
>>  1382  {
>> ...
>>  1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>  1416      {
>>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
>>  1418
>>  1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>  1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> ...
>>  1442        /* Proceed to next node.  */
>>  1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>  1444                                      &idx, cur_node,
>>  1445                                      &eps_via_nodes, fs);
>>  1446
>>  1447        if (__glibc_unlikely (cur_node < 0))
>> ...
>>  1465          }
>>  1466      }
>>=20
>> -Dimitry
>>=20
>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>=20


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW
o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0=
=R/1i
-----END PGP SIGNATURE-----

--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 31 Mar 2023 15:25:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 31 11:25:51 2023
Received: from localhost ([127.0.0.1]:34257 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piGdT-0007F5-3t
	for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:51 -0400
Received: from lists.gnu.org ([209.51.188.17]:56940)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piGdN-0007Et-MP
 for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:46 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piGdL-0008Od-T2
 for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:45 -0400
Received: from tensor.andric.com ([87.251.56.140])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piGdJ-0007lc-Tf
 for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:43 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 559306059D;
 Fri, 31 Mar 2023 17:25:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj
 aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R
 p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA
 cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43
 /NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42
 JgFZiA0j/f23g==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202303280646.32S6krUr030677@HIDDEN>
Date: Fri, 31 Mar 2023 17:25:21 +0200
Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN;
 helo=tensor.andric.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Yes, it still reproduces when I configure the latest grep using
--without-included-regex:

1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
(gdb)
1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1: idx =3D 0
(gdb)
1432          if (__glibc_unlikely (cur_node < 0))
1: idx =3D 0
(gdb)
1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0

The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.

-Dimitry

> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>=20
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
>=20
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
>=20
> Arnold
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>=20
>>> Running the command:
>>>=20
>>> echo a | grep -E -w '((()|a)|())*'
>>>=20
>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>> grep I have tried.
>>>=20
>>> This is the smallest case that could be found; simplifying anything =
in the
>>> input and/or expression leads to correct behavior.
>>=20
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>=20
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ =
COMMAND
>>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 =
grep -E -w ((()|a)|())*
>>=20
>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>=20
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>=20
>>  1378  static reg_errcode_t
>>  1379  __attribute_warn_unused_result__
>>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
>>  1381            regmatch_t *pmatch, bool fl_backtrack)
>>  1382  {
>> ...
>>  1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>  1416      {
>>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
>>  1418
>>  1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>  1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> ...
>>  1442        /* Proceed to next node.  */
>>  1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>  1444                                      &idx, cur_node,
>>  1445                                      &eps_via_nodes, fs);
>>  1446
>>  1447        if (__glibc_unlikely (cur_node < 0))
>> ...
>>  1465          }
>>  1466      }
>>=20
>> -Dimitry
>>=20
>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>=20


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW
o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0=
=R/1i
-----END PGP SIGNATURE-----

--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 28 Mar 2023 06:47:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 28 02:47:07 2023
Received: from localhost ([127.0.0.1]:48750 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ph36p-0001Yd-2p
	for submit <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:07 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:41354
 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1ph36m-0001YU-4b
 for 62483 <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:05 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 32S6ksEG030678
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Tue, 28 Mar 2023 00:46:55 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 32S6krUr030677;
 Tue, 28 Mar 2023 00:46:53 -0600
From: arnold@HIDDEN
Message-Id: <202303280646.32S6krUr030677@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Tue, 28 Mar 2023 00:46:53 -0600
To: koen@HIDDEN, dimitry@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
In-Reply-To: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

This does not reproduce with gawk, even when I force use of the regex
matcher.

What happens if grep is built with the included regex files instead of
relying on the ones in the local glibc?

Arnold

Dimitry Andric <dimitry@HIDDEN> wrote:

> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> > 
> > Running the command:
> > 
> >  echo a | grep -E -w '((()|a)|())*'
> > 
> > does not terminate, and uses a LOT of processor time, for all versions of
> > grep I have tried.
> > 
> > This is the smallest case that could be found; simplifying anything in the
> > input and/or expression leads to correct behavior.
>
> Reproducible with GNU grep 3.7 on Ubuntu 22:
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
>
> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
>
> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
>
>   1378  static reg_errcode_t
>   1379  __attribute_warn_unused_result__
>   1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
>   1381            regmatch_t *pmatch, bool fl_backtrack)
>   1382  {
> ...
>   1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>   1416      {
>   1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
>   1418
>   1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>   1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> ...
>   1442        /* Proceed to next node.  */
>   1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
>   1444                                      &idx, cur_node,
>   1445                                      &eps_via_nodes, fs);
>   1446
>   1447        if (__glibc_unlikely (cur_node < 0))
> ...
>   1465          }
>   1466      }
>
> -Dimitry
>
> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
>




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at 62483 <at> debbugs.gnu.org:


Received: (at 62483) by debbugs.gnu.org; 27 Mar 2023 17:54:48 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 27 13:54:48 2023
Received: from localhost ([127.0.0.1]:48396 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pgr3P-0006af-MB
	for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:48 -0400
Received: from tensor.andric.com ([87.251.56.140]:61183 ident=postfix)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1pgr3O-0006aT-2P
 for 62483 <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:47 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id C41BC316CF;
 Mon, 27 Mar 2023 19:54:42 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1679939682; bh=v6pn1ABwmo2Zk9WlVVTSX2HlDTRrAfOORf4/PvUrqeQ=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=L1SvqBeqaLZyAehXMuCh/SjMza9jiRzcc6wkwkKkhHpwRIOL4sLOoM6Adxphr00Mv
 xOL8dQszUSuyFlnwvqZUlYHmLFQ+pt73Y6zor1+WRuB4As5/OBjFlhOX0O8EEiel1a
 4eGWJAZR/FmRt7E9t7txMFx7spGuIY+3VLv80eDqwp908gbxEcFBKDhSjNz4v5BUXV
 j9ThNMHLQeqo+ask6CkYXCsHyidfXdi9Sy6rn8F23AMb6RbhD4UoL0T+kZFM2bRcC6
 7NCeb6tEcgKfRDGlNJgvpyBcdxyYTgE3iCgIP1L7BXpBhN5P6lA0j7K3OMeadi3Pc4
 XR7NOl0WTc3pw==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
Date: Mon, 27 Mar 2023 19:54:27 +0200
Message-Id: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
To: Koen Claessen <koen@HIDDEN>
X-Mailer: Apple Mail (2.3731.400.51.1.1)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>=20
> Running the command:
>=20
>  echo a | grep -E -w '((()|a)|())*'
>=20
> does not terminate, and uses a LOT of processor time, for all versions =
of
> grep I have tried.
>=20
> This is the smallest case that could be found; simplifying anything in =
the
> input and/or expression leads to correct behavior.

Reproducible with GNU grep 3.7 on Ubuntu 22:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ =
COMMAND
  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 =
grep -E -w ((()|a)|())*

It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.

It loops between lines 1415, 1417 and 1443, but idx and cur_node never =
change from 0:

  1378  static reg_errcode_t
  1379  __attribute_warn_unused_result__
  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
  1381            regmatch_t *pmatch, bool fl_backtrack)
  1382  {
...
  1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
  1416      {
  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
  1418
  1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
  1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
...
  1442        /* Proceed to next node.  */
  1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
  1444                                      &idx, cur_node,
  1445                                      &eps_via_nodes, fs);
  1446
  1447        if (__glibc_unlikely (cur_node < 0))
...
  1465          }
  1466      }

-Dimitry

P.S.: Interestingly this does not reproduce with BSD grep, which returns =
immediately with "a".


--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCHYUwAKCRCwXqMKLiCW
o+9rAKD66Kl1mgPSqRVaG1i0Y05AnBOdBACfYshg0HktpRuLAQQBLyyAQHYSENU=
=qC4Q
-----END PGP SIGNATURE-----

--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23--




Information forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 27 Mar 2023 13:14:11 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 27 09:14:11 2023
Received: from localhost ([127.0.0.1]:46774 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pgmfq-0005Kz-Vo
	for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 09:14:11 -0400
Received: from lists.gnu.org ([209.51.188.17]:59458)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <koen.claessen@HIDDEN>) id 1pgivx-0004UW-6G
 for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 05:14:33 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>)
 id 1pgivw-0002tj-DW
 for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400
Received: from mail-oi1-f175.google.com ([209.85.167.175])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>)
 id 1pgivu-0000lO-J0
 for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400
Received: by mail-oi1-f175.google.com with SMTP id bi31so5781083oib.9
 for <bug-grep@HIDDEN>; Mon, 27 Mar 2023 02:14:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1679908468;
 h=to:subject:message-id:date:from:mime-version:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=I0nsVRHdMBEFAIktPr2QorHZIFkZGpyQddZo2ESBGJ8=;
 b=2KbYRm6gB/nY5JRtjAjJaxhmjLHPSJBO8Cy78zB7X/9GPD5KOF/l9yexPGoJaB9nY1
 43kJglMMfnl5tW61bsgJXRntfquVQhc5ksyWEkvo92vqmW6JbZZgr9CH99P7xAm1dzW9
 q9/iMRMAttFeav8z95fj1lrvGNrCQaQ6x8r86bs0a2K5jbIvmOdNi6fpA5A97mNpMkuk
 MHlXMX630nG1TqFihmHmzmxhsEHn+EEoPH1ghbEXSVLfXsm+x97xq088/pRTPVPnEscC
 jBQhrDStI6y8q+HtPWQ4Nl7YIWMd/KADw3+N9UH/+CQDQcNipBBtyAc3Btc4wk6umvAY
 yQsw==
X-Gm-Message-State: AO0yUKWR05lCph9vF+pSaIybDbqBKWRRVrJeEwzH0KBdtHCnzIspBUj/
 U+hWrGx2XjyYZBmUJaJ1E2qYlJKqSkrFn9E2NfbZc/ug+Q4=
X-Google-Smtp-Source: AK7set+FJwBk1ZfnerizAPa/0MtB+zNp+gLKRAc9psjQo/PlnuSfdZVsLPCMkXJvQJtu5+bxXOV+O9UTykY9L0q4dpc=
X-Received: by 2002:a05:6808:b38:b0:387:1afd:5924 with SMTP id
 t24-20020a0568080b3800b003871afd5924mr2930420oij.8.1679908468569; Mon, 27 Mar
 2023 02:14:28 -0700 (PDT)
MIME-Version: 1.0
From: Koen Claessen <koen@HIDDEN>
Date: Mon, 27 Mar 2023 11:14:17 +0200
Message-ID: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
Subject: echo a | grep -E -w '((()|a)|())*' # does not terminate
To: bug-grep@HIDDEN
Content-Type: multipart/alternative; boundary="000000000000863aec05f7de2b71"
Received-SPF: pass client-ip=209.85.167.175;
 envelope-from=koen.claessen@HIDDEN; helo=mail-oi1-f175.google.com
X-Spam_score_int: -13
X-Spam_score: -1.4
X-Spam_bar: -
X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25,
 FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.1 (-)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 27 Mar 2023 09:14:05 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.1 (--)

--000000000000863aec05f7de2b71
Content-Type: text/plain; charset="UTF-8"

Hello!

Running the command:

  echo a | grep -E -w '((()|a)|())*'

does not terminate, and uses a LOT of processor time, for all versions of
grep I have tried.

This is the smallest case that could be found; simplifying anything in the
input and/or expression leads to correct behavior.

Kind regards,
/Koen

--000000000000863aec05f7de2b71
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello!<div><br></div><div>Running the command:</div><div><=
br></div><div>=C2=A0=C2=A0echo a | grep -E -w &#39;((()|a)|())*&#39;</div><=
div><br></div><div>does not terminate, and uses a LOT of processor time, fo=
r all versions of grep I have tried.</div><div><br></div><div>This is the s=
mallest case that could be found; simplifying anything in the input and/or =
expression leads to correct behavior.</div><div><br></div><div>Kind regards=
,</div><div>/Koen</div></div>

--000000000000863aec05f7de2b71--




Acknowledgement sent to Koen Claessen <koen@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-grep@HIDDEN. Full text available.
Report forwarded to bug-grep@HIDDEN:
bug#62483; Package grep. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 2 Apr 2023 18:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.