Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 18:29:50 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 14:29:50 2023 Received: from localhost ([127.0.0.1]:42640 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pj2Sc-0004O8-7q for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:50 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:39294) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1pj2SX-0004Ns-KW for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id AEBCE3C09FA02; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id oyIQv3a3PBiJ; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 752483C09FA01; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 752483C09FA01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1680460179; bh=2f3sGrU+5qn62ukHicLKudSRcxmJsffhDbks/B5+0OI=; h=Message-ID:Date:MIME-Version:To:From; b=mnBN4wAUHPlu43V8zbNB9T9pxfGf6HyO7gPzgnJgbfqqBXvW/gp9J17F6/nI3rVGT +rrkdp/1IUPB5eBvCcQVeBn2RpKvqC13iaomLjQ4AVagHIh4UuiftUMSw2cWamRB6W Y2dQr4gZNvJvDgfPnhopgZd8M4Vmb2Ndoa3j7c9n6TEAp74gomzh9DqX43neTH8sVs NlK6CuUZbnfuu3E69cBE1NrvW6XQqr9vz0xfj5qILzwTiilGWONSjPfOer4tt5nzRl ZLb3+e1lUwDk6I/8kVFh3q3CE7Py8BCBNZuXWPyDbQPObf+WY4giKoTmeImHUqXHFh 23i2Aa5w9ntag== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id xT4zfD34fpKg; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id 4A99F3C097AFE; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Message-ID: <576a26bb-b195-dd33-a5a4-ba38115889d5@HIDDEN> Date: Sun, 2 Apr 2023 11:29:39 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Content-Language: en-US To: arnold@HIDDEN, dimitry@HIDDEN References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> <202304020652.3326q2Ut024466@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: 62483 Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) On 2023-04-01 23:52, arnold@HIDDEN wrote: > It's interesting, as gawk uses the same regex, but with different flags. Also, GNU grep -w passes the following more-complicated regexp to dfaparse: (^|[^[:alnum:]_])(((()|a)|())*)([^[:alnum:]_]|$) and quite possibly the bug is related to this extra complexity.
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 10:04:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 06:04:12 2023
Received: from localhost ([127.0.0.1]:39237 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1piuZI-0000fR-43
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:12 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:59844
helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <arnold@HIDDEN>) id 1piuZG-0000fJ-Gp
for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:11 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104
(version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
Sun, 2 Apr 2023 04:03:57 -0600
Received: (from arnold@localhost)
by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103;
Sun, 2 Apr 2023 04:03:56 -0600
From: arnold@HIDDEN
Message-Id: <202304021003.332A3uGT023103@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 04:03:56 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
<EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
<202304020652.3326q2Ut024466@HIDDEN>
<4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)
OK, thanks!
Dimitry Andric <dimitry@HIDDEN> wrote:
> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
> >
> > Hi.
> >
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> >
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> >
> > I assume you meant --with-included-regex?
> >
> > If you really used --without-included-regex, that doesn't prove anything... :-)
> >
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> >
> > Thanks,
> >
> > Arnold
> >
> >>
> >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432 if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >>
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >>
> >> -Dimitry
> >>
> >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >>>
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>>
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>>
> >>> Arnold
> >>>
> >>> Dimitry Andric <dimitry@HIDDEN> wrote:
> >>>
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>>>
> >>>>> Running the command:
> >>>>>
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>>
> >>>>> does not terminate, and uses a LOT of processor time, for all versions of
> >>>>> grep I have tried.
> >>>>>
> >>>>> This is the smallest case that could be found; simplifying anything in the
> >>>>> input and/or expression leads to correct behavior.
> >>>>
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>>
> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())*
> >>>>
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>>>
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>>>
> >>>> 1378 static reg_errcode_t
> >>>> 1379 __attribute_warn_unused_result__
> >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>>> 1381 regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382 {
> >>>> ...
> >>>> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416 {
> >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>>> 1418
> >>>> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >>>> ...
> >>>> 1442 /* Proceed to next node. */
> >>>> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>>> 1444 &idx, cur_node,
> >>>> 1445 &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447 if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465 }
> >>>> 1466 }
> >>>>
> >>>> -Dimitry
> >>>>
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>>>
> >>
>
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 10:04:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 06:04:18 2023
Received: from localhost ([127.0.0.1]:39240 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1piuZN-0000fk-QN
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:18 -0400
Received: from lists.gnu.org ([209.51.188.17]:39736)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <arnold@HIDDEN>) id 1piuZL-0000fb-ID
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:16 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZK-000719-JP
for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400
Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org)
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZI-0002di-Og
for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104
(version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
Sun, 2 Apr 2023 04:03:57 -0600
Received: (from arnold@localhost)
by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103;
Sun, 2 Apr 2023 04:03:56 -0600
From: arnold@HIDDEN
Message-Id: <202304021003.332A3uGT023103@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 04:03:56 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
<EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
<202304020652.3326q2Ut024466@HIDDEN>
<4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN;
helo=freefriends.org
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001,
KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001,
SPF_NONE=0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
OK, thanks!
Dimitry Andric <dimitry@HIDDEN> wrote:
> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
> >
> > Hi.
> >
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> >
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> >
> > I assume you meant --with-included-regex?
> >
> > If you really used --without-included-regex, that doesn't prove anything... :-)
> >
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> >
> > Thanks,
> >
> > Arnold
> >
> >>
> >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432 if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >>
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >>
> >> -Dimitry
> >>
> >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >>>
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>>
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>>
> >>> Arnold
> >>>
> >>> Dimitry Andric <dimitry@HIDDEN> wrote:
> >>>
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>>>
> >>>>> Running the command:
> >>>>>
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>>
> >>>>> does not terminate, and uses a LOT of processor time, for all versions of
> >>>>> grep I have tried.
> >>>>>
> >>>>> This is the smallest case that could be found; simplifying anything in the
> >>>>> input and/or expression leads to correct behavior.
> >>>>
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>>
> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())*
> >>>>
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>>>
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>>>
> >>>> 1378 static reg_errcode_t
> >>>> 1379 __attribute_warn_unused_result__
> >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>>> 1381 regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382 {
> >>>> ...
> >>>> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416 {
> >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>>> 1418
> >>>> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >>>> ...
> >>>> 1442 /* Proceed to next node. */
> >>>> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>>> 1444 &idx, cur_node,
> >>>> 1445 &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447 if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465 }
> >>>> 1466 }
> >>>>
> >>>> -Dimitry
> >>>>
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>>>
> >>
>
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 09:34:02 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 05:34:02 2023
Received: from localhost ([127.0.0.1]:39227 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1piu65-0008AY-Hx
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:02 -0400
Received: from lists.gnu.org ([209.51.188.17]:45166)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <dimitry@HIDDEN>) id 1piu64-0008AQ-1g
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:00 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
id 1piu62-0002Ka-6N
for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:59 -0400
Received: from tensor.andric.com ([87.251.56.140])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
id 1piu5y-00074R-0e
for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:55 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235;
Sun, 2 Apr 2023 11:33:48 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=;
h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp
Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP
FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD
xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H
vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr
OWfWv7UbnXYdg==
Content-Type: multipart/signed;
boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13";
protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Date: Sun, 2 Apr 2023 11:33:42 +0200
Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
<EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
<202304020652.3326q2Ut024466@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN;
helo=tensor.andric.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)
--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS=3D"-O0 -g".
In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.
-Dimitry
> On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
>=20
> Hi.
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>=20
> I assume you meant --with-included-regex?
>=20
> If you really used --without-included-regex, that doesn't prove =
anything... :-)
>=20
> It's interesting, as gawk uses the same regex, but with different =
flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>=20
> Thanks,
>=20
> Arnold
>=20
>>=20
>> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>> (gdb)
>> 1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1: idx =3D 0
>> (gdb)
>> 1432 if (__glibc_unlikely (cur_node < 0))
>> 1: idx =3D 0
>> (gdb)
>> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>>=20
>> The endless loop looks the same. grep's regexec.c is mostly the same =
as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>=20
>> -Dimitry
>>=20
>>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>>>=20
>>> This does not reproduce with gawk, even when I force use of the =
regex
>>> matcher.
>>>=20
>>> What happens if grep is built with the included regex files instead =
of
>>> relying on the ones in the local glibc?
>>>=20
>>> Arnold
>>>=20
>>> Dimitry Andric <dimitry@HIDDEN> wrote:
>>>=20
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>>>=20
>>>>> Running the command:
>>>>>=20
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>=20
>>>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>>>> grep I have tried.
>>>>>=20
>>>>> This is the smallest case that could be found; simplifying =
anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>=20
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>=20
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM =
TIME+ COMMAND
>>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 =
0:08.32 grep -E -w ((()|a)|())*
>>>>=20
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>>>=20
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>>>=20
>>>> 1378 static reg_errcode_t
>>>> 1379 __attribute_warn_unused_result__
>>>> 1380 set_regs (const regex_t *preg, const re_match_context_t =
*mctx, size_t nmatch,
>>>> 1381 regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382 {
>>>> ...
>>>> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>>> 1416 {
>>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>>>> 1418
>>>> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>>>> ...
>>>> 1442 /* Proceed to next node. */
>>>> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>>> 1444 &idx, cur_node,
>>>> 1445 &eps_via_nodes, fs);
>>>> 1446
>>>> 1447 if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465 }
>>>> 1466 }
>>>>=20
>>>> -Dimitry
>>>>=20
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>>>=20
>>=20
--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2
iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW
o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA=
=F/9f
-----END PGP SIGNATURE-----
--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 09:33:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 05:33:54 2023
Received: from localhost ([127.0.0.1]:39224 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1piu5x-0008AC-Vd
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:54 -0400
Received: from tensor.andric.com ([87.251.56.140]:53462 ident=postfix)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <dimitry@HIDDEN>) id 1piu5u-0008A0-Gk
for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:52 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235;
Sun, 2 Apr 2023 11:33:48 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=;
h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp
Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP
FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD
xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H
vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr
OWfWv7UbnXYdg==
Content-Type: multipart/signed;
boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13";
protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Date: Sun, 2 Apr 2023 11:33:42 +0200
Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
<EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
<202304020652.3326q2Ut024466@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS=3D"-O0 -g".
In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.
-Dimitry
> On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
>=20
> Hi.
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>=20
> I assume you meant --with-included-regex?
>=20
> If you really used --without-included-regex, that doesn't prove =
anything... :-)
>=20
> It's interesting, as gawk uses the same regex, but with different =
flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>=20
> Thanks,
>=20
> Arnold
>=20
>>=20
>> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>> (gdb)
>> 1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1: idx =3D 0
>> (gdb)
>> 1432 if (__glibc_unlikely (cur_node < 0))
>> 1: idx =3D 0
>> (gdb)
>> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>>=20
>> The endless loop looks the same. grep's regexec.c is mostly the same =
as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>=20
>> -Dimitry
>>=20
>>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>>>=20
>>> This does not reproduce with gawk, even when I force use of the =
regex
>>> matcher.
>>>=20
>>> What happens if grep is built with the included regex files instead =
of
>>> relying on the ones in the local glibc?
>>>=20
>>> Arnold
>>>=20
>>> Dimitry Andric <dimitry@HIDDEN> wrote:
>>>=20
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>>>=20
>>>>> Running the command:
>>>>>=20
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>=20
>>>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>>>> grep I have tried.
>>>>>=20
>>>>> This is the smallest case that could be found; simplifying =
anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>=20
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>=20
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM =
TIME+ COMMAND
>>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 =
0:08.32 grep -E -w ((()|a)|())*
>>>>=20
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>>>=20
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>>>=20
>>>> 1378 static reg_errcode_t
>>>> 1379 __attribute_warn_unused_result__
>>>> 1380 set_regs (const regex_t *preg, const re_match_context_t =
*mctx, size_t nmatch,
>>>> 1381 regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382 {
>>>> ...
>>>> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>>> 1416 {
>>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>>>> 1418
>>>> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>>>> ...
>>>> 1442 /* Proceed to next node. */
>>>> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>>> 1444 &idx, cur_node,
>>>> 1445 &eps_via_nodes, fs);
>>>> 1446
>>>> 1447 if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465 }
>>>> 1466 }
>>>>=20
>>>> -Dimitry
>>>>=20
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>>>=20
>>=20
--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2
iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW
o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA=
=F/9f
-----END PGP SIGNATURE-----
--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 06:52:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 02:52:22 2023
Received: from localhost ([127.0.0.1]:39032 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1pirZe-0003Td-3X
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:22 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:57092
helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <arnold@HIDDEN>) id 1pirZb-0003TU-Kl
for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:20 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467
(version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
Sun, 2 Apr 2023 00:52:04 -0600
Received: (from arnold@localhost)
by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466;
Sun, 2 Apr 2023 00:52:02 -0600
From: arnold@HIDDEN
Message-Id: <202304020652.3326q2Ut024466@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 00:52:02 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
<EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)
Hi.
Dimitry Andric <dimitry@HIDDEN> wrote:
> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:
I assume you meant --with-included-regex?
If you really used --without-included-regex, that doesn't prove anything... :-)
It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.
Thanks,
Arnold
>
> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432 if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> >
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> >
> > Arnold
> >
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> >
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>
> >>> Running the command:
> >>>
> >>> echo a | grep -E -w '((()|a)|())*'
> >>>
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>>
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >>
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>
> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())*
> >>
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>
> >> 1378 static reg_errcode_t
> >> 1379 __attribute_warn_unused_result__
> >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >> 1381 regmatch_t *pmatch, bool fl_backtrack)
> >> 1382 {
> >> ...
> >> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1416 {
> >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1418
> >> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >> 1442 /* Proceed to next node. */
> >> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1444 &idx, cur_node,
> >> 1445 &eps_via_nodes, fs);
> >> 1446
> >> 1447 if (__glibc_unlikely (cur_node < 0))
> >> ...
> >> 1465 }
> >> 1466 }
> >>
> >> -Dimitry
> >>
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>
>
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 06:52:31 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 02:52:31 2023
Received: from localhost ([127.0.0.1]:39035 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1pirZm-0003Tz-R9
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:31 -0400
Received: from lists.gnu.org ([209.51.188.17]:50340)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <arnold@HIDDEN>) id 1pirZk-0003To-83
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:28 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZh-0008FE-Kz
for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400
Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org)
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZf-0006yO-Kg
for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467
(version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
Sun, 2 Apr 2023 00:52:04 -0600
Received: (from arnold@localhost)
by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466;
Sun, 2 Apr 2023 00:52:02 -0600
From: arnold@HIDDEN
Message-Id: <202304020652.3326q2Ut024466@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 00:52:02 -0600
To: dimitry@HIDDEN, arnold@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
<EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN;
helo=freefriends.org
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001,
KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001,
SPF_NONE=0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
Hi.
Dimitry Andric <dimitry@HIDDEN> wrote:
> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:
I assume you meant --with-included-regex?
If you really used --without-included-regex, that doesn't prove anything... :-)
It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.
Thanks,
Arnold
>
> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432 if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> >
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> >
> > Arnold
> >
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> >
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>
> >>> Running the command:
> >>>
> >>> echo a | grep -E -w '((()|a)|())*'
> >>>
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>>
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >>
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>
> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())*
> >>
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>
> >> 1378 static reg_errcode_t
> >> 1379 __attribute_warn_unused_result__
> >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >> 1381 regmatch_t *pmatch, bool fl_backtrack)
> >> 1382 {
> >> ...
> >> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1416 {
> >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1418
> >> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >> 1442 /* Proceed to next node. */
> >> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1444 &idx, cur_node,
> >> 1445 &eps_via_nodes, fs);
> >> 1446
> >> 1447 if (__glibc_unlikely (cur_node < 0))
> >> ...
> >> 1465 }
> >> 1466 }
> >>
> >> -Dimitry
> >>
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>
>
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 04:15:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 02 00:15:29 2023
Received: from localhost ([127.0.0.1]:38815 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1pip7p-00072N-2s
for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:29 -0400
Received: from mail-lj1-f177.google.com ([209.85.208.177]:41554)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <meyering@HIDDEN>) id 1pip7m-000728-2d
for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:27 -0400
Received: by mail-lj1-f177.google.com with SMTP id bx10so8705350ljb.8
for <62483 <at> debbugs.gnu.org>; Sat, 01 Apr 2023 21:15:26 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112; t=1680408920;
h=cc:to:subject:message-id:date:from:in-reply-to:references
:mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
:reply-to;
bh=DodkkdU3opFDpkTR7I38Cd8sH5Ml6OzzLr47sEPUB+w=;
b=xz/uM78Xeo+KZkaGtVa1P9mQjfLV1aivcei2WEfZNoOFMTOkBwpKonRG/ba1JbNr83
pYR+P1Z7IOyoWVpnhev0bwAe6bG4XFDHsU+cW10ImcfNNcmiK1ArWvyofz13j+DNKQ5n
/vWdkbrQkfucGebgx/sLK7Jdh8OA0tZt1gbZNJnPjeTPcopGYPDCigA2gGQoEBVawv7g
4x6kAG7avDpCkKtc8U9mnvqk+UI472MyheOtQNwoE3M/v2Hv8m9zsDYT6bHBxxGdDYle
4lcLUHMuIgzGnfkj3/hevOpflvPGZxjPyKTr8WEjYK3ZapFlvI9VQTWpnbJK5hO6Xz77
hAxw==
X-Gm-Message-State: AAQBX9e7GYXZxDzQFlrGuxweB9jVGPGIK9j3o02zJXm7N/ETVFXjGl10
mYjRz0gfA7+RtfFFOJlo5APyBjtu4o0gOz8KPHQ=
X-Google-Smtp-Source: AKy350ZblXwqXc5nXQDfPymdw4nPR2IN4imJcGePAqD1vZp9OXVu/8aqbvMVAAgh/JkXmi0+y0PDQvbteeLWQRYc1Jg=
X-Received: by 2002:a2e:95d8:0:b0:29a:9053:ed1b with SMTP id
y24-20020a2e95d8000000b0029a9053ed1bmr9738568ljh.3.1680408919857; Sat, 01 Apr
2023 21:15:19 -0700 (PDT)
MIME-Version: 1.0
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Sat, 1 Apr 2023 21:15:07 -0700
Message-ID: <CA+8g5KHc9W4dBJufWv96wd53ZE4fcYxrynuUEh9vNR-_1LFYbQ@HIDDEN>
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
To: Koen Claessen <koen@HIDDEN>
Content-Type: multipart/mixed; boundary="000000000000bfb5b205f852b0a6"
X-Spam-Score: 0.2 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.8 (/)
--000000000000bfb5b205f852b0a6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
On Mon, Mar 27, 2023 at 6:15=E2=80=AFAM Koen Claessen <koen@HIDDEN> wr=
ote:
> Running the command:
>
> echo a | grep -E -w '((()|a)|())*'
>
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
>
> This is the smallest case that could be found; simplifying anything in th=
e
> input and/or expression leads to correct behavior.
Thank you! How did you find that?
FYI, this strikes grep-3.10 (on Fedora 37/glibc-2.36-9.fc37.x86_64)
when using LC_ALL=3Den_US.UTF-8, but not with LC_ALL=3DC.
I.e., this infloops:
echo a | LC_ALL=3Den_US.UTF-8 grep -E -w '((()|a)|())*'
but this works as expected and promptly prints its line of input:
echo a | LC_ALL=3DC grep -E -w '((()|a)|())*'
For now, I've added an expected-failing test case for this bug:
--000000000000bfb5b205f852b0a6
Content-Type: application/octet-stream; name="grep-glibc-infloop.patch"
Content-Disposition: attachment; filename="grep-glibc-infloop.patch"
Content-Transfer-Encoding: base64
Content-ID: <f_lfyw0lqb0>
X-Attachment-Id: f_lfyw0lqb0
RnJvbSBlNDQ4ZTg2YjVjMDFjMGE5OGIxNmU4NGQ2MWVkZTc5NDAxNzBiYjViIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog
U2F0LCAxIEFwciAyMDIzIDIxOjAxOjAxIC0wNzAwClN1YmplY3Q6IFtQQVRDSF0gdGVzdHM6IGFk
ZCBhIGtub3duLWZhaWxpbmcgZ2xpYmMtaW5mbG9vcCB0ZXN0CgoqIHRlc3RzL2dsaWJjLWluZmxv
b3A6IE5ldyBmaWxlLgpCYXNlZCBvbiB0aGUgY29tbWFuZCBmcm9tIEtvZW4gQ2xhZXNzZW4KcmVw
b3J0ZWQgaW4gaHR0cHM6Ly9idWdzLmdudS5vcmcvNjI0ODMKKiB0ZXN0cy9NYWtlZmlsZS5hbSAo
VEVTVFMpOiBBZGQgdGhlIGZpbGUgbmFtZQoqIFRIQU5LUy5pbjogQWRkIG5hbWUgb2YgcmVwb3J0
ZXIuCi0tLQogVEhBTktTLmluICAgICAgICAgICB8ICAxICsKIHRlc3RzL01ha2VmaWxlLmFtICAg
fCAgMyArKysKIHRlc3RzL2dsaWJjLWluZmxvb3AgfCAxMyArKysrKysrKysrKysrCiAzIGZpbGVz
IGNoYW5nZWQsIDE3IGluc2VydGlvbnMoKykKIGNyZWF0ZSBtb2RlIDEwMDc1NSB0ZXN0cy9nbGli
Yy1pbmZsb29wCgpkaWZmIC0tZ2l0IGEvVEhBTktTLmluIGIvVEhBTktTLmluCmluZGV4IGZkMzYx
OTIuLjA2ZjhjNTQgMTAwNjQ0Ci0tLSBhL1RIQU5LUy5pbgorKysgYi9USEFOS1MuaW4KQEAgLTU2
LDYgKzU2LDcgQEAgS2FybCBQZXR0ZXJzc29uICAgICAgICAgICAgICAgICAgICAga2FybC5wZXR0
ZXJzc29uQGtscG4uc2UKIEthdmVoIFIuIEdoYXppICAgICAgICAgICAgICAgICAgICAgIGdoYXpp
QGNhaXAucnV0Z2Vycy5lZHUKIEthenVybyBGdXJ1a2F3YSAgICAgICAgICAgICAgICAgICAgIGZ1
cnVrYXdhQGFwcmljb3Qua2VrLmpwCiBLZWl0aCBCb3N0aWMgICAgICAgICAgICAgICAgICAgICAg
ICBib3N0aWNAYnNkaS5jb20KK0tvZW4gQ2xhZXNzZW4gICAgICAgICAgICAgICAgICAgICAgIGtv
ZW5AY2hhbG1lcnMuc2UKIEtyaXNobmEgU2V0aHVyYW1hbiAgICAgICAgICAgICAgICAgIGtyaXNo
bmFAc2dpaHViLmNvcnAuc2dpLmNvbQogS3VydCBEIFNjaHdlaHIgICAgICAgICAgICAgICAgICAg
ICAga2RzY2h3ZWhAaW5zY2kxNC51Y3NkLmVkdQogTHVkb3ZpYyBDb3VydMOocyAgICAgICAgICAg
ICAgICAgICAgIGx1ZG9AZ251Lm9yZwpkaWZmIC0tZ2l0IGEvdGVzdHMvTWFrZWZpbGUuYW0gYi90
ZXN0cy9NYWtlZmlsZS5hbQppbmRleCBmMTk1YzhkLi4wYzNhZGJlIDEwMDY0NAotLS0gYS90ZXN0
cy9NYWtlZmlsZS5hbQorKysgYi90ZXN0cy9NYWtlZmlsZS5hbQpAQCAtNDksNiArNDksOCBAQCBM
REFERCA9IC4uL2xpYi9saWJncmVwdXRpbHMuYSAkKExJQklOVEwpIC4uL2xpYi9saWJncmVwdXRp
bHMuYQogIyBGSVhNRS0yMDE1OiBSZW1vdmUgdGhpcyBvbmNlIHRoZSBnbGliYyBhbmQgZ251bGli
IGJ1Z3MgYXJlIGZpeGVkLgogWEZBSUxfVEVTVFMgPSB0cmlwbGUtYmFja3JlZgoKK1hGQUlMX1RF
U1RTICs9IGdsaWJjLWluZmxvb3AKKwogIyBFcXVpdmFsZW5jZSBjbGFzc2VzIGFyZSBvbmx5IHN1
cHBvcnRlZCB3aGVuIHVzaW5nIHRoZSBzeXN0ZW0KICMgbWF0Y2hlciAod2hpY2ggbWVhbnMgb25s
eSB3aXRoIGdsaWJjKS4KICMgVGhlIGluY2x1ZGVkIG1hdGNoZXIgbmVlZHMgdG8gYmUgZml4ZWQu
CkBAIC0xMDgsNiArMTEwLDcgQEAgVEVTVFMgPQkJCQkJCVwKICAgZmlsbGJ1Zi1sb25nLWxpbmUJ
CQkJXAogICBmbWJ0ZXN0CQkJCQlcCiAgIGZvYWQxCQkJCQkJXAorICBnbGliYy1pbmZsb29wCQkJ
CQlcCiAgIGdyZXAtZGV2LW51bGwJCQkJCVwKICAgZ3JlcC1kZXYtbnVsbC1vdXQJCQkJXAogICBn
cmVwLWRpcgkJCQkJXApkaWZmIC0tZ2l0IGEvdGVzdHMvZ2xpYmMtaW5mbG9vcCBiL3Rlc3RzL2ds
aWJjLWluZmxvb3AKbmV3IGZpbGUgbW9kZSAxMDA3NTUKaW5kZXggMDAwMDAwMC4uNjI0NGI2Ygot
LS0gL2Rldi9udWxsCisrKyBiL3Rlc3RzL2dsaWJjLWluZmxvb3AKQEAgLTAsMCArMSwxMyBAQAor
IyEvYmluL3NoCisjIFRoaXMgd291bGQgaW5mbG9vcCB3aGVuIHVzaW5nIGdsaWJjJ3MgcmVnZXgg
YXQgbGVhc3QgdW50aWwgZ2xpYmMtMi4zNi4KKy4gIiR7c3JjZGlyPS59L2luaXQuc2giOyBwYXRo
X3ByZXBlbmRfIC4uL3NyYworCityZXF1aXJlX3RpbWVvdXRfCityZXF1aXJlX2VuX3V0ZjhfbG9j
YWxlXworCitmYWlsPTAKKworZWNobyBhID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCityZXR1
cm5zXyAwIHRpbWVvdXQgMiBlbnYgTENfQUxMPWVuX1VTLlVURi04IGdyZXAgLUUgLXcgJygoKCl8
YSl8KCkpKicgaW4gfHwgZmFpbD0xCisKK0V4aXQgJGZhaWwKLS0gCjIuNDAuMC4xNjYuZzE0MGI5
NDc4ZGEKCg==
--000000000000bfb5b205f852b0a6--
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 31 Mar 2023 15:25:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 31 11:25:45 2023
Received: from localhost ([127.0.0.1]:34254 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1piGdM-0007El-K8
for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:45 -0400
Received: from tensor.andric.com ([87.251.56.140]:64446 ident=postfix)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <dimitry@HIDDEN>) id 1piGdI-0007EW-7c
for 62483 <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:42 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by tensor.andric.com (Postfix) with ESMTPSA id 559306059D;
Fri, 31 Mar 2023 17:25:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=;
h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj
aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R
p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA
cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43
/NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42
JgFZiA0j/f23g==
Content-Type: multipart/signed;
boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2";
protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202303280646.32S6krUr030677@HIDDEN>
Date: Fri, 31 Mar 2023 17:25:21 +0200
Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
Yes, it still reproduces when I configure the latest grep using
--without-included-regex:
1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
(gdb)
1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1: idx =3D 0
(gdb)
1432 if (__glibc_unlikely (cur_node < 0))
1: idx =3D 0
(gdb)
1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.
-Dimitry
> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>=20
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
>=20
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
>=20
> Arnold
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>=20
>>> Running the command:
>>>=20
>>> echo a | grep -E -w '((()|a)|())*'
>>>=20
>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>> grep I have tried.
>>>=20
>>> This is the smallest case that could be found; simplifying anything =
in the
>>> input and/or expression leads to correct behavior.
>>=20
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>=20
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ =
COMMAND
>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 =
grep -E -w ((()|a)|())*
>>=20
>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>=20
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>=20
>> 1378 static reg_errcode_t
>> 1379 __attribute_warn_unused_result__
>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
>> 1381 regmatch_t *pmatch, bool fl_backtrack)
>> 1382 {
>> ...
>> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1416 {
>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
>> 1418
>> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> ...
>> 1442 /* Proceed to next node. */
>> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1444 &idx, cur_node,
>> 1445 &eps_via_nodes, fs);
>> 1446
>> 1447 if (__glibc_unlikely (cur_node < 0))
>> ...
>> 1465 }
>> 1466 }
>>=20
>> -Dimitry
>>=20
>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>=20
--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2
iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW
o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0=
=R/1i
-----END PGP SIGNATURE-----
--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at submit) by debbugs.gnu.org; 31 Mar 2023 15:25:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 31 11:25:51 2023
Received: from localhost ([127.0.0.1]:34257 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1piGdT-0007F5-3t
for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:51 -0400
Received: from lists.gnu.org ([209.51.188.17]:56940)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <dimitry@HIDDEN>) id 1piGdN-0007Et-MP
for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:46 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
id 1piGdL-0008Od-T2
for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:45 -0400
Received: from tensor.andric.com ([87.251.56.140])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
id 1piGdJ-0007lc-Tf
for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:43 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by tensor.andric.com (Postfix) with ESMTPSA id 559306059D;
Fri, 31 Mar 2023 17:25:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=;
h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj
aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R
p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA
cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43
/NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42
JgFZiA0j/f23g==
Content-Type: multipart/signed;
boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2";
protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202303280646.32S6krUr030677@HIDDEN>
Date: Fri, 31 Mar 2023 17:25:21 +0200
Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
<202303280646.32S6krUr030677@HIDDEN>
To: arnold@HIDDEN
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN;
helo=tensor.andric.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)
--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
Yes, it still reproduces when I configure the latest grep using
--without-included-regex:
1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
(gdb)
1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1: idx =3D 0
(gdb)
1432 if (__glibc_unlikely (cur_node < 0))
1: idx =3D 0
(gdb)
1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.
-Dimitry
> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>=20
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
>=20
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
>=20
> Arnold
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>=20
>>> Running the command:
>>>=20
>>> echo a | grep -E -w '((()|a)|())*'
>>>=20
>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>> grep I have tried.
>>>=20
>>> This is the smallest case that could be found; simplifying anything =
in the
>>> input and/or expression leads to correct behavior.
>>=20
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>=20
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ =
COMMAND
>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 =
grep -E -w ((()|a)|())*
>>=20
>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>=20
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>=20
>> 1378 static reg_errcode_t
>> 1379 __attribute_warn_unused_result__
>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
>> 1381 regmatch_t *pmatch, bool fl_backtrack)
>> 1382 {
>> ...
>> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1416 {
>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
>> 1418
>> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> ...
>> 1442 /* Proceed to next node. */
>> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1444 &idx, cur_node,
>> 1445 &eps_via_nodes, fs);
>> 1446
>> 1447 if (__glibc_unlikely (cur_node < 0))
>> ...
>> 1465 }
>> 1466 }
>>=20
>> -Dimitry
>>=20
>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>=20
--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2
iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW
o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0=
=R/1i
-----END PGP SIGNATURE-----
--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 28 Mar 2023 06:47:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 28 02:47:07 2023
Received: from localhost ([127.0.0.1]:48750 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1ph36p-0001Yd-2p
for submit <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:07 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:41354
helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <arnold@HIDDEN>) id 1ph36m-0001YU-4b
for 62483 <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:05 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
by freefriends.org (8.14.7/8.14.7) with ESMTP id 32S6ksEG030678
(version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
Tue, 28 Mar 2023 00:46:55 -0600
Received: (from arnold@localhost)
by freefriends.org (8.14.7/8.14.7/Submit) id 32S6krUr030677;
Tue, 28 Mar 2023 00:46:53 -0600
From: arnold@HIDDEN
Message-Id: <202303280646.32S6krUr030677@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
arnold@HIDDEN using -f
Date: Tue, 28 Mar 2023 00:46:53 -0600
To: koen@HIDDEN, dimitry@HIDDEN
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
<C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
In-Reply-To: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)
This does not reproduce with gawk, even when I force use of the regex
matcher.
What happens if grep is built with the included regex files instead of
relying on the ones in the local glibc?
Arnold
Dimitry Andric <dimitry@HIDDEN> wrote:
> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >
> > Running the command:
> >
> > echo a | grep -E -w '((()|a)|())*'
> >
> > does not terminate, and uses a LOT of processor time, for all versions of
> > grep I have tried.
> >
> > This is the smallest case that could be found; simplifying anything in the
> > input and/or expression leads to correct behavior.
>
> Reproducible with GNU grep 3.7 on Ubuntu 22:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())*
>
> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
>
> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
>
> 1378 static reg_errcode_t
> 1379 __attribute_warn_unused_result__
> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> 1381 regmatch_t *pmatch, bool fl_backtrack)
> 1382 {
> ...
> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1416 {
> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1418
> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> ...
> 1442 /* Proceed to next node. */
> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1444 &idx, cur_node,
> 1445 &eps_via_nodes, fs);
> 1446
> 1447 if (__glibc_unlikely (cur_node < 0))
> ...
> 1465 }
> 1466 }
>
> -Dimitry
>
> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
>
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
Received: (at 62483) by debbugs.gnu.org; 27 Mar 2023 17:54:48 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 27 13:54:48 2023
Received: from localhost ([127.0.0.1]:48396 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1pgr3P-0006af-MB
for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:48 -0400
Received: from tensor.andric.com ([87.251.56.140]:61183 ident=postfix)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <dimitry@HIDDEN>) id 1pgr3O-0006aT-2P
for 62483 <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:47 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by tensor.andric.com (Postfix) with ESMTPSA id C41BC316CF;
Mon, 27 Mar 2023 19:54:42 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
t=1679939682; bh=v6pn1ABwmo2Zk9WlVVTSX2HlDTRrAfOORf4/PvUrqeQ=;
h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
b=L1SvqBeqaLZyAehXMuCh/SjMza9jiRzcc6wkwkKkhHpwRIOL4sLOoM6Adxphr00Mv
xOL8dQszUSuyFlnwvqZUlYHmLFQ+pt73Y6zor1+WRuB4As5/OBjFlhOX0O8EEiel1a
4eGWJAZR/FmRt7E9t7txMFx7spGuIY+3VLv80eDqwp908gbxEcFBKDhSjNz4v5BUXV
j9ThNMHLQeqo+ask6CkYXCsHyidfXdi9Sy6rn8F23AMb6RbhD4UoL0T+kZFM2bRcC6
7NCeb6tEcgKfRDGlNJgvpyBcdxyYTgE3iCgIP1L7BXpBhN5P6lA0j7K3OMeadi3Pc4
XR7NOl0WTc3pw==
Content-Type: multipart/signed;
boundary="Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23";
protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
Date: Mon, 27 Mar 2023 19:54:27 +0200
Message-Id: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
To: Koen Claessen <koen@HIDDEN>
X-Mailer: Apple Mail (2.3731.400.51.1.1)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 62483
Cc: 62483 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>=20
> Running the command:
>=20
> echo a | grep -E -w '((()|a)|())*'
>=20
> does not terminate, and uses a LOT of processor time, for all versions =
of
> grep I have tried.
>=20
> This is the smallest case that could be found; simplifying anything in =
the
> input and/or expression leads to correct behavior.
Reproducible with GNU grep 3.7 on Ubuntu 22:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ =
COMMAND
93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 =
grep -E -w ((()|a)|())*
It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
It loops between lines 1415, 1417 and 1443, but idx and cur_node never =
change from 0:
1378 static reg_errcode_t
1379 __attribute_warn_unused_result__
1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
1381 regmatch_t *pmatch, bool fl_backtrack)
1382 {
...
1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1416 {
1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1418
1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1420 || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
...
1442 /* Proceed to next node. */
1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1444 &idx, cur_node,
1445 &eps_via_nodes, fs);
1446
1447 if (__glibc_unlikely (cur_node < 0))
...
1465 }
1466 }
-Dimitry
P.S.: Interestingly this does not reproduce with BSD grep, which returns =
immediately with "a".
--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2
iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCHYUwAKCRCwXqMKLiCW
o+9rAKD66Kl1mgPSqRVaG1i0Y05AnBOdBACfYshg0HktpRuLAQQBLyyAQHYSENU=
=qC4Q
-----END PGP SIGNATURE-----
--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23--
bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.Received: (at submit) by debbugs.gnu.org; 27 Mar 2023 13:14:11 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 27 09:14:11 2023 Received: from localhost ([127.0.0.1]:46774 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pgmfq-0005Kz-Vo for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 09:14:11 -0400 Received: from lists.gnu.org ([209.51.188.17]:59458) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <koen.claessen@HIDDEN>) id 1pgivx-0004UW-6G for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 05:14:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>) id 1pgivw-0002tj-DW for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400 Received: from mail-oi1-f175.google.com ([209.85.167.175]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>) id 1pgivu-0000lO-J0 for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400 Received: by mail-oi1-f175.google.com with SMTP id bi31so5781083oib.9 for <bug-grep@HIDDEN>; Mon, 27 Mar 2023 02:14:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679908468; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=I0nsVRHdMBEFAIktPr2QorHZIFkZGpyQddZo2ESBGJ8=; b=2KbYRm6gB/nY5JRtjAjJaxhmjLHPSJBO8Cy78zB7X/9GPD5KOF/l9yexPGoJaB9nY1 43kJglMMfnl5tW61bsgJXRntfquVQhc5ksyWEkvo92vqmW6JbZZgr9CH99P7xAm1dzW9 q9/iMRMAttFeav8z95fj1lrvGNrCQaQ6x8r86bs0a2K5jbIvmOdNi6fpA5A97mNpMkuk MHlXMX630nG1TqFihmHmzmxhsEHn+EEoPH1ghbEXSVLfXsm+x97xq088/pRTPVPnEscC jBQhrDStI6y8q+HtPWQ4Nl7YIWMd/KADw3+N9UH/+CQDQcNipBBtyAc3Btc4wk6umvAY yQsw== X-Gm-Message-State: AO0yUKWR05lCph9vF+pSaIybDbqBKWRRVrJeEwzH0KBdtHCnzIspBUj/ U+hWrGx2XjyYZBmUJaJ1E2qYlJKqSkrFn9E2NfbZc/ug+Q4= X-Google-Smtp-Source: AK7set+FJwBk1ZfnerizAPa/0MtB+zNp+gLKRAc9psjQo/PlnuSfdZVsLPCMkXJvQJtu5+bxXOV+O9UTykY9L0q4dpc= X-Received: by 2002:a05:6808:b38:b0:387:1afd:5924 with SMTP id t24-20020a0568080b3800b003871afd5924mr2930420oij.8.1679908468569; Mon, 27 Mar 2023 02:14:28 -0700 (PDT) MIME-Version: 1.0 From: Koen Claessen <koen@HIDDEN> Date: Mon, 27 Mar 2023 11:14:17 +0200 Message-ID: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> Subject: echo a | grep -E -w '((()|a)|())*' # does not terminate To: bug-grep@HIDDEN Content-Type: multipart/alternative; boundary="000000000000863aec05f7de2b71" Received-SPF: pass client-ip=209.85.167.175; envelope-from=koen.claessen@HIDDEN; helo=mail-oi1-f175.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 27 Mar 2023 09:14:05 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) --000000000000863aec05f7de2b71 Content-Type: text/plain; charset="UTF-8" Hello! Running the command: echo a | grep -E -w '((()|a)|())*' does not terminate, and uses a LOT of processor time, for all versions of grep I have tried. This is the smallest case that could be found; simplifying anything in the input and/or expression leads to correct behavior. Kind regards, /Koen --000000000000863aec05f7de2b71 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Hello!<div><br></div><div>Running the command:</div><div><= br></div><div>=C2=A0=C2=A0echo a | grep -E -w '((()|a)|())*'</div><= div><br></div><div>does not terminate, and uses a LOT of processor time, fo= r all versions of grep I have tried.</div><div><br></div><div>This is the s= mallest case that could be found; simplifying anything in the input and/or = expression leads to correct behavior.</div><div><br></div><div>Kind regards= ,</div><div>/Koen</div></div> --000000000000863aec05f7de2b71--
Koen Claessen <koen@HIDDEN>:bug-grep@HIDDEN.
Full text available.bug-grep@HIDDEN:bug#62483; Package grep.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.