GNU logs - #62483, boring messages


Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Koen Claessen <koen@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Mon, 27 Mar 2023 13:15:05 +0000
Resent-Message-ID: <handler.62483.B.167992285120532 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: 62483 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-grep@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.167992285120532
          (code B ref -1); Mon, 27 Mar 2023 13:15:05 +0000
Received: (at submit) by debbugs.gnu.org; 27 Mar 2023 13:14:11 +0000
Received: from localhost ([127.0.0.1]:46774 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pgmfq-0005Kz-Vo
	for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 09:14:11 -0400
Received: from lists.gnu.org ([209.51.188.17]:59458)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <koen.claessen@HIDDEN>) id 1pgivx-0004UW-6G
 for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 05:14:33 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>)
 id 1pgivw-0002tj-DW
 for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400
Received: from mail-oi1-f175.google.com ([209.85.167.175])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>)
 id 1pgivu-0000lO-J0
 for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400
Received: by mail-oi1-f175.google.com with SMTP id bi31so5781083oib.9
 for <bug-grep@HIDDEN>; Mon, 27 Mar 2023 02:14:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1679908468;
 h=to:subject:message-id:date:from:mime-version:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=I0nsVRHdMBEFAIktPr2QorHZIFkZGpyQddZo2ESBGJ8=;
 b=2KbYRm6gB/nY5JRtjAjJaxhmjLHPSJBO8Cy78zB7X/9GPD5KOF/l9yexPGoJaB9nY1
 43kJglMMfnl5tW61bsgJXRntfquVQhc5ksyWEkvo92vqmW6JbZZgr9CH99P7xAm1dzW9
 q9/iMRMAttFeav8z95fj1lrvGNrCQaQ6x8r86bs0a2K5jbIvmOdNi6fpA5A97mNpMkuk
 MHlXMX630nG1TqFihmHmzmxhsEHn+EEoPH1ghbEXSVLfXsm+x97xq088/pRTPVPnEscC
 jBQhrDStI6y8q+HtPWQ4Nl7YIWMd/KADw3+N9UH/+CQDQcNipBBtyAc3Btc4wk6umvAY
 yQsw==
X-Gm-Message-State: AO0yUKWR05lCph9vF+pSaIybDbqBKWRRVrJeEwzH0KBdtHCnzIspBUj/
 U+hWrGx2XjyYZBmUJaJ1E2qYlJKqSkrFn9E2NfbZc/ug+Q4=
X-Google-Smtp-Source: AK7set+FJwBk1ZfnerizAPa/0MtB+zNp+gLKRAc9psjQo/PlnuSfdZVsLPCMkXJvQJtu5+bxXOV+O9UTykY9L0q4dpc=
X-Received: by 2002:a05:6808:b38:b0:387:1afd:5924 with SMTP id
 t24-20020a0568080b3800b003871afd5924mr2930420oij.8.1679908468569; Mon, 27 Mar
 2023 02:14:28 -0700 (PDT)
MIME-Version: 1.0
From: Koen Claessen <koen@HIDDEN>
Date: Mon, 27 Mar 2023 11:14:17 +0200
Message-ID: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
Content-Type: multipart/alternative; boundary="000000000000863aec05f7de2b71"
Received-SPF: pass client-ip=209.85.167.175;
 envelope-from=koen.claessen@HIDDEN; helo=mail-oi1-f175.google.com
X-Spam_score_int: -13
X-Spam_score: -1.4
X-Spam_bar: -
X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25,
 FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.1 (-)
X-Mailman-Approved-At: Mon, 27 Mar 2023 09:14:05 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.1 (--)

--000000000000863aec05f7de2b71
Content-Type: text/plain; charset="UTF-8"

Hello!

Running the command:

  echo a | grep -E -w '((()|a)|())*'

does not terminate, and uses a LOT of processor time, for all versions of
grep I have tried.

This is the smallest case that could be found; simplifying anything in the
input and/or expression leads to correct behavior.

Kind regards,
/Koen

--000000000000863aec05f7de2b71
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello!<div><br></div><div>Running the command:</div><div><=
br></div><div>=C2=A0=C2=A0echo a | grep -E -w &#39;((()|a)|())*&#39;</div><=
div><br></div><div>does not terminate, and uses a LOT of processor time, fo=
r all versions of grep I have tried.</div><div><br></div><div>This is the s=
mallest case that could be found; simplifying anything in the input and/or =
expression leads to correct behavior.</div><div><br></div><div>Kind regards=
,</div><div>/Koen</div></div>

--000000000000863aec05f7de2b71--




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Koen Claessen <koen@HIDDEN>
Subject: bug#62483: Acknowledgement (echo a | grep -E -w '((()|a)|())*' #
 does not terminate)
Message-ID: <handler.62483.B.167992285120532.ack <at> debbugs.gnu.org>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
X-Gnu-PR-Message: ack 62483
X-Gnu-PR-Package: grep
Reply-To: 62483 <at> debbugs.gnu.org
Date: Mon, 27 Mar 2023 13:15:05 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-grep@HIDDEN

If you wish to submit further information on this problem, please
send it to 62483 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
62483: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D62483
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Dimitry Andric <dimitry@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Mon, 27 Mar 2023 17:55:01 +0000
Resent-Message-ID: <handler.62483.B62483.167993968825341 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: Koen Claessen <koen@HIDDEN>
Cc: 62483 <at> debbugs.gnu.org
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.167993968825341
          (code B ref 62483); Mon, 27 Mar 2023 17:55:01 +0000
Received: (at 62483) by debbugs.gnu.org; 27 Mar 2023 17:54:48 +0000
Received: from localhost ([127.0.0.1]:48396 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pgr3P-0006af-MB
	for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:48 -0400
Received: from tensor.andric.com ([87.251.56.140]:61183 ident=postfix)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1pgr3O-0006aT-2P
 for 62483 <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:47 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id C41BC316CF;
 Mon, 27 Mar 2023 19:54:42 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1679939682; bh=v6pn1ABwmo2Zk9WlVVTSX2HlDTRrAfOORf4/PvUrqeQ=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=L1SvqBeqaLZyAehXMuCh/SjMza9jiRzcc6wkwkKkhHpwRIOL4sLOoM6Adxphr00Mv
 xOL8dQszUSuyFlnwvqZUlYHmLFQ+pt73Y6zor1+WRuB4As5/OBjFlhOX0O8EEiel1a
 4eGWJAZR/FmRt7E9t7txMFx7spGuIY+3VLv80eDqwp908gbxEcFBKDhSjNz4v5BUXV
 j9ThNMHLQeqo+ask6CkYXCsHyidfXdi9Sy6rn8F23AMb6RbhD4UoL0T+kZFM2bRcC6
 7NCeb6tEcgKfRDGlNJgvpyBcdxyYTgE3iCgIP1L7BXpBhN5P6lA0j7K3OMeadi3Pc4
 XR7NOl0WTc3pw==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
Date: Mon, 27 Mar 2023 19:54:27 +0200
Message-Id: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
X-Mailer: Apple Mail (2.3731.400.51.1.1)
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>=20
> Running the command:
>=20
>  echo a | grep -E -w '((()|a)|())*'
>=20
> does not terminate, and uses a LOT of processor time, for all versions =
of
> grep I have tried.
>=20
> This is the smallest case that could be found; simplifying anything in =
the
> input and/or expression leads to correct behavior.

Reproducible with GNU grep 3.7 on Ubuntu 22:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ =
COMMAND
  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 =
grep -E -w ((()|a)|())*

It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.

It loops between lines 1415, 1417 and 1443, but idx and cur_node never =
change from 0:

  1378  static reg_errcode_t
  1379  __attribute_warn_unused_result__
  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
  1381            regmatch_t *pmatch, bool fl_backtrack)
  1382  {
...
  1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
  1416      {
  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
  1418
  1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
  1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
...
  1442        /* Proceed to next node.  */
  1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
  1444                                      &idx, cur_node,
  1445                                      &eps_via_nodes, fs);
  1446
  1447        if (__glibc_unlikely (cur_node < 0))
...
  1465          }
  1466      }

-Dimitry

P.S.: Interestingly this does not reproduce with BSD grep, which returns =
immediately with "a".


--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCHYUwAKCRCwXqMKLiCW
o+9rAKD66Kl1mgPSqRVaG1i0Y05AnBOdBACfYshg0HktpRuLAQQBLyyAQHYSENU=
=qC4Q
-----END PGP SIGNATURE-----

--Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: arnold@HIDDEN
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Tue, 28 Mar 2023 06:48:01 +0000
Resent-Message-ID: <handler.62483.B62483.16799860275996 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: koen@HIDDEN, dimitry@HIDDEN
Cc: 62483 <at> debbugs.gnu.org
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.16799860275996
          (code B ref 62483); Tue, 28 Mar 2023 06:48:01 +0000
Received: (at 62483) by debbugs.gnu.org; 28 Mar 2023 06:47:07 +0000
Received: from localhost ([127.0.0.1]:48750 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ph36p-0001Yd-2p
	for submit <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:07 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:41354
 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1ph36m-0001YU-4b
 for 62483 <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:05 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 32S6ksEG030678
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Tue, 28 Mar 2023 00:46:55 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 32S6krUr030677;
 Tue, 28 Mar 2023 00:46:53 -0600
From: arnold@HIDDEN
Message-Id: <202303280646.32S6krUr030677@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Tue, 28 Mar 2023 00:46:53 -0600
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
In-Reply-To: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

This does not reproduce with gawk, even when I force use of the regex
matcher.

What happens if grep is built with the included regex files instead of
relying on the ones in the local glibc?

Arnold

Dimitry Andric <dimitry@HIDDEN> wrote:

> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> > 
> > Running the command:
> > 
> >  echo a | grep -E -w '((()|a)|())*'
> > 
> > does not terminate, and uses a LOT of processor time, for all versions of
> > grep I have tried.
> > 
> > This is the smallest case that could be found; simplifying anything in the
> > input and/or expression leads to correct behavior.
>
> Reproducible with GNU grep 3.7 on Ubuntu 22:
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
>
> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
>
> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
>
>   1378  static reg_errcode_t
>   1379  __attribute_warn_unused_result__
>   1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
>   1381            regmatch_t *pmatch, bool fl_backtrack)
>   1382  {
> ...
>   1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>   1416      {
>   1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
>   1418
>   1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>   1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> ...
>   1442        /* Proceed to next node.  */
>   1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
>   1444                                      &idx, cur_node,
>   1445                                      &eps_via_nodes, fs);
>   1446
>   1447        if (__glibc_unlikely (cur_node < 0))
> ...
>   1465          }
>   1466      }
>
> -Dimitry
>
> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
>




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Dimitry Andric <dimitry@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Fri, 31 Mar 2023 15:26:02 +0000
Resent-Message-ID: <handler.62483.B.168027635127847 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.168027635127847
          (code B ref -1); Fri, 31 Mar 2023 15:26:02 +0000
Received: (at submit) by debbugs.gnu.org; 31 Mar 2023 15:25:51 +0000
Received: from localhost ([127.0.0.1]:34257 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piGdT-0007F5-3t
	for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:51 -0400
Received: from lists.gnu.org ([209.51.188.17]:56940)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piGdN-0007Et-MP
 for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:46 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piGdL-0008Od-T2
 for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:45 -0400
Received: from tensor.andric.com ([87.251.56.140])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piGdJ-0007lc-Tf
 for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:43 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 559306059D;
 Fri, 31 Mar 2023 17:25:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj
 aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R
 p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA
 cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43
 /NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42
 JgFZiA0j/f23g==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202303280646.32S6krUr030677@HIDDEN>
Date: Fri, 31 Mar 2023 17:25:21 +0200
Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN;
 helo=tensor.andric.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Yes, it still reproduces when I configure the latest grep using
--without-included-regex:

1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
(gdb)
1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1: idx =3D 0
(gdb)
1432          if (__glibc_unlikely (cur_node < 0))
1: idx =3D 0
(gdb)
1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0

The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.

-Dimitry

> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>=20
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
>=20
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
>=20
> Arnold
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>=20
>>> Running the command:
>>>=20
>>> echo a | grep -E -w '((()|a)|())*'
>>>=20
>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>> grep I have tried.
>>>=20
>>> This is the smallest case that could be found; simplifying anything =
in the
>>> input and/or expression leads to correct behavior.
>>=20
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>=20
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ =
COMMAND
>>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 =
grep -E -w ((()|a)|())*
>>=20
>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>=20
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>=20
>>  1378  static reg_errcode_t
>>  1379  __attribute_warn_unused_result__
>>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
>>  1381            regmatch_t *pmatch, bool fl_backtrack)
>>  1382  {
>> ...
>>  1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>  1416      {
>>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
>>  1418
>>  1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>  1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> ...
>>  1442        /* Proceed to next node.  */
>>  1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>  1444                                      &idx, cur_node,
>>  1445                                      &eps_via_nodes, fs);
>>  1446
>>  1447        if (__glibc_unlikely (cur_node < 0))
>> ...
>>  1465          }
>>  1466      }
>>=20
>> -Dimitry
>>=20
>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>=20


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW
o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0=
=R/1i
-----END PGP SIGNATURE-----

--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Dimitry Andric <dimitry@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Fri, 31 Mar 2023 15:26:02 +0000
Resent-Message-ID: <handler.62483.B62483.168027634527827 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168027634527827
          (code B ref 62483); Fri, 31 Mar 2023 15:26:02 +0000
Received: (at 62483) by debbugs.gnu.org; 31 Mar 2023 15:25:45 +0000
Received: from localhost ([127.0.0.1]:34254 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piGdM-0007El-K8
	for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:45 -0400
Received: from tensor.andric.com ([87.251.56.140]:64446 ident=postfix)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piGdI-0007EW-7c
 for 62483 <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:42 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 559306059D;
 Fri, 31 Mar 2023 17:25:37 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj
 aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R
 p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA
 cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43
 /NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42
 JgFZiA0j/f23g==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202303280646.32S6krUr030677@HIDDEN>
Date: Fri, 31 Mar 2023 17:25:21 +0200
Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
X-Mailer: Apple Mail (2.3731.500.231)
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Yes, it still reproduces when I configure the latest grep using
--without-included-regex:

1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0
(gdb)
1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
1: idx =3D 0
(gdb)
1432          if (__glibc_unlikely (cur_node < 0))
1: idx =3D 0
(gdb)
1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
1: idx =3D 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
1: idx =3D 0
(gdb)
1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
1: idx =3D 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
1: idx =3D 0

The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.

-Dimitry

> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>=20
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
>=20
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
>=20
> Arnold
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>=20
>>> Running the command:
>>>=20
>>> echo a | grep -E -w '((()|a)|())*'
>>>=20
>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>> grep I have tried.
>>>=20
>>> This is the smallest case that could be found; simplifying anything =
in the
>>> input and/or expression leads to correct behavior.
>>=20
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>=20
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ =
COMMAND
>>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 =
grep -E -w ((()|a)|())*
>>=20
>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>=20
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>=20
>>  1378  static reg_errcode_t
>>  1379  __attribute_warn_unused_result__
>>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, =
size_t nmatch,
>>  1381            regmatch_t *pmatch, bool fl_backtrack)
>>  1382  {
>> ...
>>  1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>  1416      {
>>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, =
nmatch);
>>  1418
>>  1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>  1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> ...
>>  1442        /* Proceed to next node.  */
>>  1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>  1444                                      &idx, cur_node,
>>  1445                                      &eps_via_nodes, fs);
>>  1446
>>  1447        if (__glibc_unlikely (cur_node < 0))
>> ...
>>  1465          }
>>  1466      }
>>=20
>> -Dimitry
>>=20
>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>=20


--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW
o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0=
=R/1i
-----END PGP SIGNATURE-----

--Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Jim Meyering <jim@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 04:16:02 +0000
Resent-Message-ID: <handler.62483.B62483.168040892927060 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: Koen Claessen <koen@HIDDEN>
Cc: 62483 <at> debbugs.gnu.org
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168040892927060
          (code B ref 62483); Sun, 02 Apr 2023 04:16:02 +0000
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 04:15:29 +0000
Received: from localhost ([127.0.0.1]:38815 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pip7p-00072N-2s
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:29 -0400
Received: from mail-lj1-f177.google.com ([209.85.208.177]:41554)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1pip7m-000728-2d
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:27 -0400
Received: by mail-lj1-f177.google.com with SMTP id bx10so8705350ljb.8
 for <62483 <at> debbugs.gnu.org>; Sat, 01 Apr 2023 21:15:26 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1680408920;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=DodkkdU3opFDpkTR7I38Cd8sH5Ml6OzzLr47sEPUB+w=;
 b=xz/uM78Xeo+KZkaGtVa1P9mQjfLV1aivcei2WEfZNoOFMTOkBwpKonRG/ba1JbNr83
 pYR+P1Z7IOyoWVpnhev0bwAe6bG4XFDHsU+cW10ImcfNNcmiK1ArWvyofz13j+DNKQ5n
 /vWdkbrQkfucGebgx/sLK7Jdh8OA0tZt1gbZNJnPjeTPcopGYPDCigA2gGQoEBVawv7g
 4x6kAG7avDpCkKtc8U9mnvqk+UI472MyheOtQNwoE3M/v2Hv8m9zsDYT6bHBxxGdDYle
 4lcLUHMuIgzGnfkj3/hevOpflvPGZxjPyKTr8WEjYK3ZapFlvI9VQTWpnbJK5hO6Xz77
 hAxw==
X-Gm-Message-State: AAQBX9e7GYXZxDzQFlrGuxweB9jVGPGIK9j3o02zJXm7N/ETVFXjGl10
 mYjRz0gfA7+RtfFFOJlo5APyBjtu4o0gOz8KPHQ=
X-Google-Smtp-Source: AKy350ZblXwqXc5nXQDfPymdw4nPR2IN4imJcGePAqD1vZp9OXVu/8aqbvMVAAgh/JkXmi0+y0PDQvbteeLWQRYc1Jg=
X-Received: by 2002:a2e:95d8:0:b0:29a:9053:ed1b with SMTP id
 y24-20020a2e95d8000000b0029a9053ed1bmr9738568ljh.3.1680408919857; Sat, 01 Apr
 2023 21:15:19 -0700 (PDT)
MIME-Version: 1.0
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Sat, 1 Apr 2023 21:15:07 -0700
Message-ID: <CA+8g5KHc9W4dBJufWv96wd53ZE4fcYxrynuUEh9vNR-_1LFYbQ@HIDDEN>
Content-Type: multipart/mixed; boundary="000000000000bfb5b205f852b0a6"
X-Spam-Score: 0.2 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.8 (/)

--000000000000bfb5b205f852b0a6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Mar 27, 2023 at 6:15=E2=80=AFAM Koen Claessen <koen@HIDDEN> wr=
ote:
> Running the command:
>
>   echo a | grep -E -w '((()|a)|())*'
>
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
>
> This is the smallest case that could be found; simplifying anything in th=
e
> input and/or expression leads to correct behavior.

Thank you! How did you find that?

FYI, this strikes grep-3.10 (on Fedora 37/glibc-2.36-9.fc37.x86_64)
when using LC_ALL=3Den_US.UTF-8, but not with LC_ALL=3DC.
I.e., this infloops:
   echo a | LC_ALL=3Den_US.UTF-8 grep -E -w '((()|a)|())*'

but this works as expected and promptly prints its line of input:
     echo a | LC_ALL=3DC grep -E -w '((()|a)|())*'

For now, I've added an expected-failing test case for this bug:

--000000000000bfb5b205f852b0a6
Content-Type: application/octet-stream; name="grep-glibc-infloop.patch"
Content-Disposition: attachment; filename="grep-glibc-infloop.patch"
Content-Transfer-Encoding: base64
Content-ID: <f_lfyw0lqb0>
X-Attachment-Id: f_lfyw0lqb0

RnJvbSBlNDQ4ZTg2YjVjMDFjMGE5OGIxNmU4NGQ2MWVkZTc5NDAxNzBiYjViIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog
U2F0LCAxIEFwciAyMDIzIDIxOjAxOjAxIC0wNzAwClN1YmplY3Q6IFtQQVRDSF0gdGVzdHM6IGFk
ZCBhIGtub3duLWZhaWxpbmcgZ2xpYmMtaW5mbG9vcCB0ZXN0CgoqIHRlc3RzL2dsaWJjLWluZmxv
b3A6IE5ldyBmaWxlLgpCYXNlZCBvbiB0aGUgY29tbWFuZCBmcm9tIEtvZW4gQ2xhZXNzZW4KcmVw
b3J0ZWQgaW4gaHR0cHM6Ly9idWdzLmdudS5vcmcvNjI0ODMKKiB0ZXN0cy9NYWtlZmlsZS5hbSAo
VEVTVFMpOiBBZGQgdGhlIGZpbGUgbmFtZQoqIFRIQU5LUy5pbjogQWRkIG5hbWUgb2YgcmVwb3J0
ZXIuCi0tLQogVEhBTktTLmluICAgICAgICAgICB8ICAxICsKIHRlc3RzL01ha2VmaWxlLmFtICAg
fCAgMyArKysKIHRlc3RzL2dsaWJjLWluZmxvb3AgfCAxMyArKysrKysrKysrKysrCiAzIGZpbGVz
IGNoYW5nZWQsIDE3IGluc2VydGlvbnMoKykKIGNyZWF0ZSBtb2RlIDEwMDc1NSB0ZXN0cy9nbGli
Yy1pbmZsb29wCgpkaWZmIC0tZ2l0IGEvVEhBTktTLmluIGIvVEhBTktTLmluCmluZGV4IGZkMzYx
OTIuLjA2ZjhjNTQgMTAwNjQ0Ci0tLSBhL1RIQU5LUy5pbgorKysgYi9USEFOS1MuaW4KQEAgLTU2
LDYgKzU2LDcgQEAgS2FybCBQZXR0ZXJzc29uICAgICAgICAgICAgICAgICAgICAga2FybC5wZXR0
ZXJzc29uQGtscG4uc2UKIEthdmVoIFIuIEdoYXppICAgICAgICAgICAgICAgICAgICAgIGdoYXpp
QGNhaXAucnV0Z2Vycy5lZHUKIEthenVybyBGdXJ1a2F3YSAgICAgICAgICAgICAgICAgICAgIGZ1
cnVrYXdhQGFwcmljb3Qua2VrLmpwCiBLZWl0aCBCb3N0aWMgICAgICAgICAgICAgICAgICAgICAg
ICBib3N0aWNAYnNkaS5jb20KK0tvZW4gQ2xhZXNzZW4gICAgICAgICAgICAgICAgICAgICAgIGtv
ZW5AY2hhbG1lcnMuc2UKIEtyaXNobmEgU2V0aHVyYW1hbiAgICAgICAgICAgICAgICAgIGtyaXNo
bmFAc2dpaHViLmNvcnAuc2dpLmNvbQogS3VydCBEIFNjaHdlaHIgICAgICAgICAgICAgICAgICAg
ICAga2RzY2h3ZWhAaW5zY2kxNC51Y3NkLmVkdQogTHVkb3ZpYyBDb3VydMOocyAgICAgICAgICAg
ICAgICAgICAgIGx1ZG9AZ251Lm9yZwpkaWZmIC0tZ2l0IGEvdGVzdHMvTWFrZWZpbGUuYW0gYi90
ZXN0cy9NYWtlZmlsZS5hbQppbmRleCBmMTk1YzhkLi4wYzNhZGJlIDEwMDY0NAotLS0gYS90ZXN0
cy9NYWtlZmlsZS5hbQorKysgYi90ZXN0cy9NYWtlZmlsZS5hbQpAQCAtNDksNiArNDksOCBAQCBM
REFERCA9IC4uL2xpYi9saWJncmVwdXRpbHMuYSAkKExJQklOVEwpIC4uL2xpYi9saWJncmVwdXRp
bHMuYQogIyBGSVhNRS0yMDE1OiBSZW1vdmUgdGhpcyBvbmNlIHRoZSBnbGliYyBhbmQgZ251bGli
IGJ1Z3MgYXJlIGZpeGVkLgogWEZBSUxfVEVTVFMgPSB0cmlwbGUtYmFja3JlZgoKK1hGQUlMX1RF
U1RTICs9IGdsaWJjLWluZmxvb3AKKwogIyBFcXVpdmFsZW5jZSBjbGFzc2VzIGFyZSBvbmx5IHN1
cHBvcnRlZCB3aGVuIHVzaW5nIHRoZSBzeXN0ZW0KICMgbWF0Y2hlciAod2hpY2ggbWVhbnMgb25s
eSB3aXRoIGdsaWJjKS4KICMgVGhlIGluY2x1ZGVkIG1hdGNoZXIgbmVlZHMgdG8gYmUgZml4ZWQu
CkBAIC0xMDgsNiArMTEwLDcgQEAgVEVTVFMgPQkJCQkJCVwKICAgZmlsbGJ1Zi1sb25nLWxpbmUJ
CQkJXAogICBmbWJ0ZXN0CQkJCQlcCiAgIGZvYWQxCQkJCQkJXAorICBnbGliYy1pbmZsb29wCQkJ
CQlcCiAgIGdyZXAtZGV2LW51bGwJCQkJCVwKICAgZ3JlcC1kZXYtbnVsbC1vdXQJCQkJXAogICBn
cmVwLWRpcgkJCQkJXApkaWZmIC0tZ2l0IGEvdGVzdHMvZ2xpYmMtaW5mbG9vcCBiL3Rlc3RzL2ds
aWJjLWluZmxvb3AKbmV3IGZpbGUgbW9kZSAxMDA3NTUKaW5kZXggMDAwMDAwMC4uNjI0NGI2Ygot
LS0gL2Rldi9udWxsCisrKyBiL3Rlc3RzL2dsaWJjLWluZmxvb3AKQEAgLTAsMCArMSwxMyBAQAor
IyEvYmluL3NoCisjIFRoaXMgd291bGQgaW5mbG9vcCB3aGVuIHVzaW5nIGdsaWJjJ3MgcmVnZXgg
YXQgbGVhc3QgdW50aWwgZ2xpYmMtMi4zNi4KKy4gIiR7c3JjZGlyPS59L2luaXQuc2giOyBwYXRo
X3ByZXBlbmRfIC4uL3NyYworCityZXF1aXJlX3RpbWVvdXRfCityZXF1aXJlX2VuX3V0ZjhfbG9j
YWxlXworCitmYWlsPTAKKworZWNobyBhID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCityZXR1
cm5zXyAwIHRpbWVvdXQgMiBlbnYgTENfQUxMPWVuX1VTLlVURi04IGdyZXAgLUUgLXcgJygoKCl8
YSl8KCkpKicgaW4gfHwgZmFpbD0xCisKK0V4aXQgJGZhaWwKLS0gCjIuNDAuMC4xNjYuZzE0MGI5
NDc4ZGEKCg==
--000000000000bfb5b205f852b0a6--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: arnold@HIDDEN
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 06:53:01 +0000
Resent-Message-ID: <handler.62483.B.168041835113395 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: dimitry@HIDDEN, arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.168041835113395
          (code B ref -1); Sun, 02 Apr 2023 06:53:01 +0000
Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 06:52:31 +0000
Received: from localhost ([127.0.0.1]:39035 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pirZm-0003Tz-R9
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:31 -0400
Received: from lists.gnu.org ([209.51.188.17]:50340)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1pirZk-0003To-83
 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:28 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZh-0008FE-Kz
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400
Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZf-0006yO-Kg
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 00:52:04 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466;
 Sun, 2 Apr 2023 00:52:02 -0600
From: arnold@HIDDEN
Message-Id: <202304020652.3326q2Ut024466@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 00:52:02 -0600
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN;
 helo=freefriends.org
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001,
 KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001,
 SPF_NONE=0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi.

Dimitry Andric <dimitry@HIDDEN> wrote:

> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:

I assume you meant --with-included-regex?

If you really used --without-included-regex, that doesn't prove anything... :-)

It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.

Thanks,

Arnold

>
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432          if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> > 
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> > 
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> > 
> > Arnold
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>> 
> >>> Running the command:
> >>> 
> >>> echo a | grep -E -w '((()|a)|())*'
> >>> 
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>> 
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >> 
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >> 
> >>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >> 
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >> 
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >> 
> >>  1378  static reg_errcode_t
> >>  1379  __attribute_warn_unused_result__
> >>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>  1381            regmatch_t *pmatch, bool fl_backtrack)
> >>  1382  {
> >> ...
> >>  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>  1416      {
> >>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>  1418
> >>  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >>  1442        /* Proceed to next node.  */
> >>  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>  1444                                      &idx, cur_node,
> >>  1445                                      &eps_via_nodes, fs);
> >>  1446
> >>  1447        if (__glibc_unlikely (cur_node < 0))
> >> ...
> >>  1465          }
> >>  1466      }
> >> 
> >> -Dimitry
> >> 
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >> 
>




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: arnold@HIDDEN
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 06:53:02 +0000
Resent-Message-ID: <handler.62483.B62483.168041834213373 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: dimitry@HIDDEN, arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168041834213373
          (code B ref 62483); Sun, 02 Apr 2023 06:53:02 +0000
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 06:52:22 +0000
Received: from localhost ([127.0.0.1]:39032 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pirZe-0003Td-3X
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:22 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:57092
 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1pirZb-0003TU-Kl
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:20 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 00:52:04 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466;
 Sun, 2 Apr 2023 00:52:02 -0600
From: arnold@HIDDEN
Message-Id: <202304020652.3326q2Ut024466@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 00:52:02 -0600
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Hi.

Dimitry Andric <dimitry@HIDDEN> wrote:

> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:

I assume you meant --with-included-regex?

If you really used --without-included-regex, that doesn't prove anything... :-)

It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.

Thanks,

Arnold

>
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432          if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> > 
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> > 
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> > 
> > Arnold
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>> 
> >>> Running the command:
> >>> 
> >>> echo a | grep -E -w '((()|a)|())*'
> >>> 
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>> 
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >> 
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >> 
> >>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >> 
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >> 
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >> 
> >>  1378  static reg_errcode_t
> >>  1379  __attribute_warn_unused_result__
> >>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>  1381            regmatch_t *pmatch, bool fl_backtrack)
> >>  1382  {
> >> ...
> >>  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>  1416      {
> >>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>  1418
> >>  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >>  1442        /* Proceed to next node.  */
> >>  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>  1444                                      &idx, cur_node,
> >>  1445                                      &eps_via_nodes, fs);
> >>  1446
> >>  1447        if (__glibc_unlikely (cur_node < 0))
> >> ...
> >>  1465          }
> >>  1466      }
> >> 
> >> -Dimitry
> >> 
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >> 
>




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Dimitry Andric <dimitry@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 09:34:02 +0000
Resent-Message-ID: <handler.62483.B62483.168042803431388 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168042803431388
          (code B ref 62483); Sun, 02 Apr 2023 09:34:02 +0000
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 09:33:54 +0000
Received: from localhost ([127.0.0.1]:39224 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piu5x-0008AC-Vd
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:54 -0400
Received: from tensor.andric.com ([87.251.56.140]:53462 ident=postfix)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piu5u-0008A0-Gk
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:52 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235;
 Sun,  2 Apr 2023 11:33:48 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp
 Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP
 FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD
 xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H
 vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr
 OWfWv7UbnXYdg==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Date: Sun, 2 Apr 2023 11:33:42 +0200
Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
X-Mailer: Apple Mail (2.3731.500.231)
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS=3D"-O0 -g".

In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.

-Dimitry

> On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
>=20
> Hi.
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>=20
> I assume you meant --with-included-regex?
>=20
> If you really used --without-included-regex, that doesn't prove =
anything... :-)
>=20
> It's interesting, as gawk uses the same regex, but with different =
flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>=20
> Thanks,
>=20
> Arnold
>=20
>>=20
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>> (gdb)
>> 1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1: idx =3D 0
>> (gdb)
>> 1432          if (__glibc_unlikely (cur_node < 0))
>> 1: idx =3D 0
>> (gdb)
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>>=20
>> The endless loop looks the same. grep's regexec.c is mostly the same =
as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>=20
>> -Dimitry
>>=20
>>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>>>=20
>>> This does not reproduce with gawk, even when I force use of the =
regex
>>> matcher.
>>>=20
>>> What happens if grep is built with the included regex files instead =
of
>>> relying on the ones in the local glibc?
>>>=20
>>> Arnold
>>>=20
>>> Dimitry Andric <dimitry@HIDDEN> wrote:
>>>=20
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>>>=20
>>>>> Running the command:
>>>>>=20
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>=20
>>>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>>>> grep I have tried.
>>>>>=20
>>>>> This is the smallest case that could be found; simplifying =
anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>=20
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>=20
>>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     =
TIME+ COMMAND
>>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   =
0:08.32 grep -E -w ((()|a)|())*
>>>>=20
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>>>=20
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>>>=20
>>>> 1378  static reg_errcode_t
>>>> 1379  __attribute_warn_unused_result__
>>>> 1380  set_regs (const regex_t *preg, const re_match_context_t =
*mctx, size_t nmatch,
>>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382  {
>>>> ...
>>>> 1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>>> 1416      {
>>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>>>> 1418
>>>> 1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>>>> ...
>>>> 1442        /* Proceed to next node.  */
>>>> 1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>>> 1444                                      &idx, cur_node,
>>>> 1445                                      &eps_via_nodes, fs);
>>>> 1446
>>>> 1447        if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465          }
>>>> 1466      }
>>>>=20
>>>> -Dimitry
>>>>=20
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>>>=20
>>=20


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW
o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA=
=F/9f
-----END PGP SIGNATURE-----

--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Dimitry Andric <dimitry@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 09:34:02 +0000
Resent-Message-ID: <handler.62483.B.168042804231417 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.168042804231417
          (code B ref -1); Sun, 02 Apr 2023 09:34:02 +0000
Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 09:34:02 +0000
Received: from localhost ([127.0.0.1]:39227 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piu65-0008AY-Hx
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:02 -0400
Received: from lists.gnu.org ([209.51.188.17]:45166)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dimitry@HIDDEN>) id 1piu64-0008AQ-1g
 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:00 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piu62-0002Ka-6N
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:59 -0400
Received: from tensor.andric.com ([87.251.56.140])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>)
 id 1piu5y-00074R-0e
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:55 -0400
Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235;
 Sun,  2 Apr 2023 11:33:48 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904;
 t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=;
 h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From;
 b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp
 Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP
 FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD
 xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H
 vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr
 OWfWv7UbnXYdg==
Content-Type: multipart/signed;
 boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
From: Dimitry Andric <dimitry@HIDDEN>
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Date: Sun, 2 Apr 2023 11:33:42 +0200
Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN;
 helo=tensor.andric.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS=3D"-O0 -g".

In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.

-Dimitry

> On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
>=20
> Hi.
>=20
> Dimitry Andric <dimitry@HIDDEN> wrote:
>=20
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>=20
> I assume you meant --with-included-regex?
>=20
> If you really used --without-included-regex, that doesn't prove =
anything... :-)
>=20
> It's interesting, as gawk uses the same regex, but with different =
flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>=20
> Thanks,
>=20
> Arnold
>=20
>>=20
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>> (gdb)
>> 1428          cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>> 1: idx =3D 0
>> (gdb)
>> 1432          if (__glibc_unlikely (cur_node < 0))
>> 1: idx =3D 0
>> (gdb)
>> 1400      for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>> 1: idx =3D 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>> 1: idx =3D 0
>> (gdb)
>> 1404          if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>> 1: idx =3D 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>> 1: idx =3D 0
>>=20
>> The endless loop looks the same. grep's regexec.c is mostly the same =
as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>=20
>> -Dimitry
>>=20
>>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
>>>=20
>>> This does not reproduce with gawk, even when I force use of the =
regex
>>> matcher.
>>>=20
>>> What happens if grep is built with the included regex files instead =
of
>>> relying on the ones in the local glibc?
>>>=20
>>> Arnold
>>>=20
>>> Dimitry Andric <dimitry@HIDDEN> wrote:
>>>=20
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
>>>>>=20
>>>>> Running the command:
>>>>>=20
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>=20
>>>>> does not terminate, and uses a LOT of processor time, for all =
versions of
>>>>> grep I have tried.
>>>>>=20
>>>>> This is the smallest case that could be found; simplifying =
anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>=20
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>=20
>>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     =
TIME+ COMMAND
>>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   =
0:08.32 grep -E -w ((()|a)|())*
>>>>=20
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's =
regexec(3), and it is that implementation which ends up in an endless =
loop.
>>>>=20
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node =
never change from 0:
>>>>=20
>>>> 1378  static reg_errcode_t
>>>> 1379  __attribute_warn_unused_result__
>>>> 1380  set_regs (const regex_t *preg, const re_match_context_t =
*mctx, size_t nmatch,
>>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382  {
>>>> ...
>>>> 1415    for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;)
>>>> 1416      {
>>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, =
idx, nmatch);
>>>> 1418
>>>> 1419        if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D =
mctx->last_node)
>>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, =
cur_node)))
>>>> ...
>>>> 1442        /* Proceed to next node.  */
>>>> 1443        cur_node =3D proceed_next_node (mctx, nmatch, pmatch, =
prev_idx_match,
>>>> 1444                                      &idx, cur_node,
>>>> 1445                                      &eps_via_nodes, fs);
>>>> 1446
>>>> 1447        if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465          }
>>>> 1466      }
>>>>=20
>>>> -Dimitry
>>>>=20
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which =
returns immediately with "a".
>>>>=20
>>=20


--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.2

iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW
o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA=
=F/9f
-----END PGP SIGNATURE-----

--Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: arnold@HIDDEN
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 10:05:01 +0000
Resent-Message-ID: <handler.62483.B.16804298582592 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: dimitry@HIDDEN, arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.16804298582592
          (code B ref -1); Sun, 02 Apr 2023 10:05:01 +0000
Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 10:04:18 +0000
Received: from localhost ([127.0.0.1]:39240 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piuZN-0000fk-QN
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:18 -0400
Received: from lists.gnu.org ([209.51.188.17]:39736)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1piuZL-0000fb-ID
 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:16 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZK-000719-JP
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400
Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZI-0002di-Og
 for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 04:03:57 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103;
 Sun, 2 Apr 2023 04:03:56 -0600
From: arnold@HIDDEN
Message-Id: <202304021003.332A3uGT023103@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 04:03:56 -0600
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
 <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN;
 helo=freefriends.org
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001,
 KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001,
 SPF_NONE=0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

OK, thanks!

Dimitry Andric <dimitry@HIDDEN> wrote:

> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
> > 
> > Hi.
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> > 
> > I assume you meant --with-included-regex?
> > 
> > If you really used --without-included-regex, that doesn't prove anything... :-)
> > 
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> >> 
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432          if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> 
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >> 
> >> -Dimitry
> >> 
> >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >>> 
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>> 
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>> 
> >>> Arnold
> >>> 
> >>> Dimitry Andric <dimitry@HIDDEN> wrote:
> >>> 
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>>> 
> >>>>> Running the command:
> >>>>> 
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>> 
> >>>>> does not terminate, and uses a LOT of processor time, for all versions of
> >>>>> grep I have tried.
> >>>>> 
> >>>>> This is the smallest case that could be found; simplifying anything in the
> >>>>> input and/or expression leads to correct behavior.
> >>>> 
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>> 
> >>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >>>> 
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>>> 
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>>> 
> >>>> 1378  static reg_errcode_t
> >>>> 1379  __attribute_warn_unused_result__
> >>>> 1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382  {
> >>>> ...
> >>>> 1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416      {
> >>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>>> 1418
> >>>> 1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >>>> ...
> >>>> 1442        /* Proceed to next node.  */
> >>>> 1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>>> 1444                                      &idx, cur_node,
> >>>> 1445                                      &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447        if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465          }
> >>>> 1466      }
> >>>> 
> >>>> -Dimitry
> >>>> 
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>>> 
> >> 
>




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: arnold@HIDDEN
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 10:05:02 +0000
Resent-Message-ID: <handler.62483.B62483.16804298522574 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: dimitry@HIDDEN, arnold@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.16804298522574
          (code B ref 62483); Sun, 02 Apr 2023 10:05:02 +0000
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 10:04:12 +0000
Received: from localhost ([127.0.0.1]:39237 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1piuZI-0000fR-43
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:12 -0400
Received: from frenzy.freefriends.org ([198.99.81.75]:59844
 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <arnold@HIDDEN>) id 1piuZG-0000fJ-Gp
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:11 -0400
X-Envelope-From: arnold@HIDDEN
Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75])
 by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Sun, 2 Apr 2023 04:03:57 -0600
Received: (from arnold@localhost)
 by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103;
 Sun, 2 Apr 2023 04:03:56 -0600
From: arnold@HIDDEN
Message-Id: <202304021003.332A3uGT023103@HIDDEN>
X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to
 arnold@HIDDEN using -f
Date: Sun, 02 Apr 2023 04:03:56 -0600
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
 <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN>
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Score: 1.0 (+)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

OK, thanks!

Dimitry Andric <dimitry@HIDDEN> wrote:

> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote:
> > 
> > Hi.
> > 
> > Dimitry Andric <dimitry@HIDDEN> wrote:
> > 
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> > 
> > I assume you meant --with-included-regex?
> > 
> > If you really used --without-included-regex, that doesn't prove anything... :-)
> > 
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> >> 
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432          if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> 1: idx = 0
> >> 
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >> 
> >> -Dimitry
> >> 
> >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote:
> >>> 
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>> 
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>> 
> >>> Arnold
> >>> 
> >>> Dimitry Andric <dimitry@HIDDEN> wrote:
> >>> 
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote:
> >>>>> 
> >>>>> Running the command:
> >>>>> 
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>> 
> >>>>> does not terminate, and uses a LOT of processor time, for all versions of
> >>>>> grep I have tried.
> >>>>> 
> >>>>> This is the smallest case that could be found; simplifying anything in the
> >>>>> input and/or expression leads to correct behavior.
> >>>> 
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>> 
> >>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> >>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
> >>>> 
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>>> 
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>>> 
> >>>> 1378  static reg_errcode_t
> >>>> 1379  __attribute_warn_unused_result__
> >>>> 1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382  {
> >>>> ...
> >>>> 1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416      {
> >>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >>>> 1418
> >>>> 1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >>>> ...
> >>>> 1442        /* Proceed to next node.  */
> >>>> 1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >>>> 1444                                      &idx, cur_node,
> >>>> 1445                                      &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447        if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465          }
> >>>> 1466      }
> >>>> 
> >>>> -Dimitry
> >>>> 
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>>> 
> >> 
>




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Resent-From: Paul Eggert <eggert@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sun, 02 Apr 2023 18:30:02 +0000
Resent-Message-ID: <handler.62483.B62483.168046019016876 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 62483
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: arnold@HIDDEN, dimitry@HIDDEN
Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN
Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168046019016876
          (code B ref 62483); Sun, 02 Apr 2023 18:30:02 +0000
Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 18:29:50 +0000
Received: from localhost ([127.0.0.1]:42640 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pj2Sc-0004O8-7q
	for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:50 -0400
Received: from mail.cs.ucla.edu ([131.179.128.66]:39294)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1pj2SX-0004Ns-KW
 for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:48 -0400
Received: from localhost (localhost [127.0.0.1])
 by mail.cs.ucla.edu (Postfix) with ESMTP id AEBCE3C09FA02;
 Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Received: from mail.cs.ucla.edu ([127.0.0.1])
 by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id oyIQv3a3PBiJ; Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
 by mail.cs.ucla.edu (Postfix) with ESMTP id 752483C09FA01;
 Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 752483C09FA01
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu;
 s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1680460179;
 bh=2f3sGrU+5qn62ukHicLKudSRcxmJsffhDbks/B5+0OI=;
 h=Message-ID:Date:MIME-Version:To:From;
 b=mnBN4wAUHPlu43V8zbNB9T9pxfGf6HyO7gPzgnJgbfqqBXvW/gp9J17F6/nI3rVGT
 +rrkdp/1IUPB5eBvCcQVeBn2RpKvqC13iaomLjQ4AVagHIh4UuiftUMSw2cWamRB6W
 Y2dQr4gZNvJvDgfPnhopgZd8M4Vmb2Ndoa3j7c9n6TEAp74gomzh9DqX43neTH8sVs
 NlK6CuUZbnfuu3E69cBE1NrvW6XQqr9vz0xfj5qILzwTiilGWONSjPfOer4tt5nzRl
 ZLb3+e1lUwDk6I/8kVFh3q3CE7Py8BCBNZuXWPyDbQPObf+WY4giKoTmeImHUqXHFh
 23i2Aa5w9ntag==
X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu
Received: from mail.cs.ucla.edu ([127.0.0.1])
 by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id xT4zfD34fpKg; Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com
 [172.91.119.151])
 by mail.cs.ucla.edu (Postfix) with ESMTPSA id 4A99F3C097AFE;
 Sun,  2 Apr 2023 11:29:39 -0700 (PDT)
Message-ID: <576a26bb-b195-dd33-a5a4-ba38115889d5@HIDDEN>
Date: Sun, 2 Apr 2023 11:29:39 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.0
Content-Language: en-US
References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN>
 <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN>
 <202303280646.32S6krUr030677@HIDDEN>
 <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN>
 <202304020652.3326q2Ut024466@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.1 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.1 (--)

On 2023-04-01 23:52, arnold@HIDDEN wrote:
> It's interesting, as gawk uses the same regex, but with different flags.

Also, GNU grep -w passes the following more-complicated regexp to dfaparse:

   (^|[^[:alnum:]_])(((()|a)|())*)([^[:alnum:]_]|$)

and quite possibly the bug is related to this extra complexity.





Last modified: Sun, 2 Apr 2023 18:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.