X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Koen Claessen <koen@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Mon, 27 Mar 2023 13:15:05 +0000 Resent-Message-ID: <handler.62483.B.167992285120532 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 62483 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-grep@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.167992285120532 (code B ref -1); Mon, 27 Mar 2023 13:15:05 +0000 Received: (at submit) by debbugs.gnu.org; 27 Mar 2023 13:14:11 +0000 Received: from localhost ([127.0.0.1]:46774 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pgmfq-0005Kz-Vo for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 09:14:11 -0400 Received: from lists.gnu.org ([209.51.188.17]:59458) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <koen.claessen@HIDDEN>) id 1pgivx-0004UW-6G for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 05:14:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>) id 1pgivw-0002tj-DW for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400 Received: from mail-oi1-f175.google.com ([209.85.167.175]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <koen.claessen@HIDDEN>) id 1pgivu-0000lO-J0 for bug-grep@HIDDEN; Mon, 27 Mar 2023 05:14:32 -0400 Received: by mail-oi1-f175.google.com with SMTP id bi31so5781083oib.9 for <bug-grep@HIDDEN>; Mon, 27 Mar 2023 02:14:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679908468; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=I0nsVRHdMBEFAIktPr2QorHZIFkZGpyQddZo2ESBGJ8=; b=2KbYRm6gB/nY5JRtjAjJaxhmjLHPSJBO8Cy78zB7X/9GPD5KOF/l9yexPGoJaB9nY1 43kJglMMfnl5tW61bsgJXRntfquVQhc5ksyWEkvo92vqmW6JbZZgr9CH99P7xAm1dzW9 q9/iMRMAttFeav8z95fj1lrvGNrCQaQ6x8r86bs0a2K5jbIvmOdNi6fpA5A97mNpMkuk MHlXMX630nG1TqFihmHmzmxhsEHn+EEoPH1ghbEXSVLfXsm+x97xq088/pRTPVPnEscC jBQhrDStI6y8q+HtPWQ4Nl7YIWMd/KADw3+N9UH/+CQDQcNipBBtyAc3Btc4wk6umvAY yQsw== X-Gm-Message-State: AO0yUKWR05lCph9vF+pSaIybDbqBKWRRVrJeEwzH0KBdtHCnzIspBUj/ U+hWrGx2XjyYZBmUJaJ1E2qYlJKqSkrFn9E2NfbZc/ug+Q4= X-Google-Smtp-Source: AK7set+FJwBk1ZfnerizAPa/0MtB+zNp+gLKRAc9psjQo/PlnuSfdZVsLPCMkXJvQJtu5+bxXOV+O9UTykY9L0q4dpc= X-Received: by 2002:a05:6808:b38:b0:387:1afd:5924 with SMTP id t24-20020a0568080b3800b003871afd5924mr2930420oij.8.1679908468569; Mon, 27 Mar 2023 02:14:28 -0700 (PDT) MIME-Version: 1.0 From: Koen Claessen <koen@HIDDEN> Date: Mon, 27 Mar 2023 11:14:17 +0200 Message-ID: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> Content-Type: multipart/alternative; boundary="000000000000863aec05f7de2b71" Received-SPF: pass client-ip=209.85.167.175; envelope-from=koen.claessen@HIDDEN; helo=mail-oi1-f175.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.1 (-) X-Mailman-Approved-At: Mon, 27 Mar 2023 09:14:05 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) --000000000000863aec05f7de2b71 Content-Type: text/plain; charset="UTF-8" Hello! Running the command: echo a | grep -E -w '((()|a)|())*' does not terminate, and uses a LOT of processor time, for all versions of grep I have tried. This is the smallest case that could be found; simplifying anything in the input and/or expression leads to correct behavior. Kind regards, /Koen --000000000000863aec05f7de2b71 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Hello!<div><br></div><div>Running the command:</div><div><= br></div><div>=C2=A0=C2=A0echo a | grep -E -w '((()|a)|())*'</div><= div><br></div><div>does not terminate, and uses a LOT of processor time, fo= r all versions of grep I have tried.</div><div><br></div><div>This is the s= mallest case that could be found; simplifying anything in the input and/or = expression leads to correct behavior.</div><div><br></div><div>Kind regards= ,</div><div>/Koen</div></div> --000000000000863aec05f7de2b71--
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Koen Claessen <koen@HIDDEN> Subject: bug#62483: Acknowledgement (echo a | grep -E -w '((()|a)|())*' # does not terminate) Message-ID: <handler.62483.B.167992285120532.ack <at> debbugs.gnu.org> References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> X-Gnu-PR-Message: ack 62483 X-Gnu-PR-Package: grep Reply-To: 62483 <at> debbugs.gnu.org Date: Mon, 27 Mar 2023 13:15:05 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-grep@HIDDEN If you wish to submit further information on this problem, please send it to 62483 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 62483: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D62483 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Dimitry Andric <dimitry@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Mon, 27 Mar 2023 17:55:01 +0000 Resent-Message-ID: <handler.62483.B62483.167993968825341 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Koen Claessen <koen@HIDDEN> Cc: 62483 <at> debbugs.gnu.org Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.167993968825341 (code B ref 62483); Mon, 27 Mar 2023 17:55:01 +0000 Received: (at 62483) by debbugs.gnu.org; 27 Mar 2023 17:54:48 +0000 Received: from localhost ([127.0.0.1]:48396 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pgr3P-0006af-MB for submit <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:48 -0400 Received: from tensor.andric.com ([87.251.56.140]:61183 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dimitry@HIDDEN>) id 1pgr3O-0006aT-2P for 62483 <at> debbugs.gnu.org; Mon, 27 Mar 2023 13:54:47 -0400 Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id C41BC316CF; Mon, 27 Mar 2023 19:54:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904; t=1679939682; bh=v6pn1ABwmo2Zk9WlVVTSX2HlDTRrAfOORf4/PvUrqeQ=; h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From; b=L1SvqBeqaLZyAehXMuCh/SjMza9jiRzcc6wkwkKkhHpwRIOL4sLOoM6Adxphr00Mv xOL8dQszUSuyFlnwvqZUlYHmLFQ+pt73Y6zor1+WRuB4As5/OBjFlhOX0O8EEiel1a 4eGWJAZR/FmRt7E9t7txMFx7spGuIY+3VLv80eDqwp908gbxEcFBKDhSjNz4v5BUXV j9ThNMHLQeqo+ask6CkYXCsHyidfXdi9Sy6rn8F23AMb6RbhD4UoL0T+kZFM2bRcC6 7NCeb6tEcgKfRDGlNJgvpyBcdxyYTgE3iCgIP1L7BXpBhN5P6lA0j7K3OMeadi3Pc4 XR7NOl0WTc3pw== Content-Type: multipart/signed; boundary="Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) From: Dimitry Andric <dimitry@HIDDEN> In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> Date: Mon, 27 Mar 2023 19:54:27 +0200 Message-Id: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: >=20 > Running the command: >=20 > echo a | grep -E -w '((()|a)|())*' >=20 > does not terminate, and uses a LOT of processor time, for all versions = of > grep I have tried. >=20 > This is the smallest case that could be found; simplifying anything in = the > input and/or expression leads to correct behavior. Reproducible with GNU grep 3.7 on Ubuntu 22: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ = COMMAND 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 = grep -E -w ((()|a)|())* It seems that at least on Ubuntu, grep in this mode uses glibc's = regexec(3), and it is that implementation which ends up in an endless = loop. It loops between lines 1415, 1417 and 1443, but idx and cur_node never = change from 0: 1378 static reg_errcode_t 1379 __attribute_warn_unused_result__ 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, = size_t nmatch, 1381 regmatch_t *pmatch, bool fl_backtrack) 1382 { ... 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) 1416 { 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); 1418 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) 1420 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) ... 1442 /* Proceed to next node. */ 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, 1444 &idx, cur_node, 1445 &eps_via_nodes, fs); 1446 1447 if (__glibc_unlikely (cur_node < 0)) ... 1465 } 1466 } -Dimitry P.S.: Interestingly this does not reproduce with BSD grep, which returns = immediately with "a". --Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.2 iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCHYUwAKCRCwXqMKLiCW o+9rAKD66Kl1mgPSqRVaG1i0Y05AnBOdBACfYshg0HktpRuLAQQBLyyAQHYSENU= =qC4Q -----END PGP SIGNATURE----- --Apple-Mail=_D6F6E370-5190-456A-AFDD-7684509FEA23--
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: arnold@HIDDEN Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Tue, 28 Mar 2023 06:48:01 +0000 Resent-Message-ID: <handler.62483.B62483.16799860275996 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: koen@HIDDEN, dimitry@HIDDEN Cc: 62483 <at> debbugs.gnu.org Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.16799860275996 (code B ref 62483); Tue, 28 Mar 2023 06:48:01 +0000 Received: (at 62483) by debbugs.gnu.org; 28 Mar 2023 06:47:07 +0000 Received: from localhost ([127.0.0.1]:48750 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ph36p-0001Yd-2p for submit <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:07 -0400 Received: from frenzy.freefriends.org ([198.99.81.75]:41354 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1ph36m-0001YU-4b for 62483 <at> debbugs.gnu.org; Tue, 28 Mar 2023 02:47:05 -0400 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 32S6ksEG030678 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Mar 2023 00:46:55 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 32S6krUr030677; Tue, 28 Mar 2023 00:46:53 -0600 From: arnold@HIDDEN Message-Id: <202303280646.32S6krUr030677@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Tue, 28 Mar 2023 00:46:53 -0600 References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> In-Reply-To: <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) This does not reproduce with gawk, even when I force use of the regex matcher. What happens if grep is built with the included regex files instead of relying on the ones in the local glibc? Arnold Dimitry Andric <dimitry@HIDDEN> wrote: > On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: > > > > Running the command: > > > > echo a | grep -E -w '((()|a)|())*' > > > > does not terminate, and uses a LOT of processor time, for all versions of > > grep I have tried. > > > > This is the smallest case that could be found; simplifying anything in the > > input and/or expression leads to correct behavior. > > Reproducible with GNU grep 3.7 on Ubuntu 22: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* > > It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. > > It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: > > 1378 static reg_errcode_t > 1379 __attribute_warn_unused_result__ > 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, > 1381 regmatch_t *pmatch, bool fl_backtrack) > 1382 { > ... > 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > 1416 { > 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > 1418 > 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > ... > 1442 /* Proceed to next node. */ > 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > 1444 &idx, cur_node, > 1445 &eps_via_nodes, fs); > 1446 > 1447 if (__glibc_unlikely (cur_node < 0)) > ... > 1465 } > 1466 } > > -Dimitry > > P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a". >
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Dimitry Andric <dimitry@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Fri, 31 Mar 2023 15:26:02 +0000 Resent-Message-ID: <handler.62483.B.168027635127847 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.168027635127847 (code B ref -1); Fri, 31 Mar 2023 15:26:02 +0000 Received: (at submit) by debbugs.gnu.org; 31 Mar 2023 15:25:51 +0000 Received: from localhost ([127.0.0.1]:34257 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1piGdT-0007F5-3t for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:51 -0400 Received: from lists.gnu.org ([209.51.188.17]:56940) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dimitry@HIDDEN>) id 1piGdN-0007Et-MP for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>) id 1piGdL-0008Od-T2 for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:45 -0400 Received: from tensor.andric.com ([87.251.56.140]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>) id 1piGdJ-0007lc-Tf for bug-grep@HIDDEN; Fri, 31 Mar 2023 11:25:43 -0400 Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 559306059D; Fri, 31 Mar 2023 17:25:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904; t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=; h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From; b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43 /NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42 JgFZiA0j/f23g== Content-Type: multipart/signed; boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\)) From: Dimitry Andric <dimitry@HIDDEN> In-Reply-To: <202303280646.32S6krUr030677@HIDDEN> Date: Fri, 31 Mar 2023 17:25:21 +0200 Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> X-Mailer: Apple Mail (2.3731.500.231) Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN; helo=tensor.andric.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.4 (--) --Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Yes, it still reproduces when I configure the latest grep using --without-included-regex: 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) 1: idx =3D 0 (gdb) 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); 1: idx =3D 0 (gdb) 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) 1: idx =3D 0 (gdb) 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) 1: idx =3D 0 (gdb) 1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, 1: idx =3D 0 (gdb) 1432 if (__glibc_unlikely (cur_node < 0)) 1: idx =3D 0 (gdb) 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) 1: idx =3D 0 (gdb) 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); 1: idx =3D 0 (gdb) 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) 1: idx =3D 0 (gdb) 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) 1: idx =3D 0 The endless loop looks the same. grep's regexec.c is mostly the same as glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N directives added. -Dimitry > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: >=20 > This does not reproduce with gawk, even when I force use of the regex > matcher. >=20 > What happens if grep is built with the included regex files instead of > relying on the ones in the local glibc? >=20 > Arnold >=20 > Dimitry Andric <dimitry@HIDDEN> wrote: >=20 >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: >>>=20 >>> Running the command: >>>=20 >>> echo a | grep -E -w '((()|a)|())*' >>>=20 >>> does not terminate, and uses a LOT of processor time, for all = versions of >>> grep I have tried. >>>=20 >>> This is the smallest case that could be found; simplifying anything = in the >>> input and/or expression leads to correct behavior. >>=20 >> Reproducible with GNU grep 3.7 on Ubuntu 22: >>=20 >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ = COMMAND >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 = grep -E -w ((()|a)|())* >>=20 >> It seems that at least on Ubuntu, grep in this mode uses glibc's = regexec(3), and it is that implementation which ends up in an endless = loop. >>=20 >> It loops between lines 1415, 1417 and 1443, but idx and cur_node = never change from 0: >>=20 >> 1378 static reg_errcode_t >> 1379 __attribute_warn_unused_result__ >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, = size_t nmatch, >> 1381 regmatch_t *pmatch, bool fl_backtrack) >> 1382 { >> ... >> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >> 1416 { >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); >> 1418 >> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >> ... >> 1442 /* Proceed to next node. */ >> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, >> 1444 &idx, cur_node, >> 1445 &eps_via_nodes, fs); >> 1446 >> 1447 if (__glibc_unlikely (cur_node < 0)) >> ... >> 1465 } >> 1466 } >>=20 >> -Dimitry >>=20 >> P.S.: Interestingly this does not reproduce with BSD grep, which = returns immediately with "a". >>=20 --Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.2 iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0= =R/1i -----END PGP SIGNATURE----- --Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Dimitry Andric <dimitry@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Fri, 31 Mar 2023 15:26:02 +0000 Resent-Message-ID: <handler.62483.B62483.168027634527827 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168027634527827 (code B ref 62483); Fri, 31 Mar 2023 15:26:02 +0000 Received: (at 62483) by debbugs.gnu.org; 31 Mar 2023 15:25:45 +0000 Received: from localhost ([127.0.0.1]:34254 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1piGdM-0007El-K8 for submit <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:45 -0400 Received: from tensor.andric.com ([87.251.56.140]:64446 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dimitry@HIDDEN>) id 1piGdI-0007EW-7c for 62483 <at> debbugs.gnu.org; Fri, 31 Mar 2023 11:25:42 -0400 Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 559306059D; Fri, 31 Mar 2023 17:25:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904; t=1680276337; bh=tv3wdsLbUzJuawKCrRN9gRq8xh79liGVtP2VlXgo/zE=; h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From; b=Nz2cHPImbM0SK6j1NdD5Aygk3RL4epuqwPgTaTrK8MurzsrOZO/aL+KQbRgKDBhbj aplfthjlXLsiJZDVL2pSxwJVm4yxAp7BR65GpLgbw2PUwYPVGA4Hx3T4CCKcuH/M+R p85inkdVWLNGlChp1juhxPiFh2YP1gtM31fXWxnU8876mdCg1/vBX1C4O3bYA9g7eA cQkApkFp7WRvHQpBYsuiYsFA1e5QyXFSbf6ARiPW5O9EWj/MxPeymomqcOYvoj+k43 /NU1qEh1dbeXftTI+hKE+QfbZO3MWzD3rSKS1tz0nRW8/kPQB2qwKW6ZDDWY5lwk42 JgFZiA0j/f23g== Content-Type: multipart/signed; boundary="Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\)) From: Dimitry Andric <dimitry@HIDDEN> In-Reply-To: <202303280646.32S6krUr030677@HIDDEN> Date: Fri, 31 Mar 2023 17:25:21 +0200 Message-Id: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> X-Mailer: Apple Mail (2.3731.500.231) X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Yes, it still reproduces when I configure the latest grep using --without-included-regex: 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) 1: idx =3D 0 (gdb) 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); 1: idx =3D 0 (gdb) 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) 1: idx =3D 0 (gdb) 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) 1: idx =3D 0 (gdb) 1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, 1: idx =3D 0 (gdb) 1432 if (__glibc_unlikely (cur_node < 0)) 1: idx =3D 0 (gdb) 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) 1: idx =3D 0 (gdb) 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); 1: idx =3D 0 (gdb) 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) 1: idx =3D 0 (gdb) 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) 1: idx =3D 0 The endless loop looks the same. grep's regexec.c is mostly the same as glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N directives added. -Dimitry > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: >=20 > This does not reproduce with gawk, even when I force use of the regex > matcher. >=20 > What happens if grep is built with the included regex files instead of > relying on the ones in the local glibc? >=20 > Arnold >=20 > Dimitry Andric <dimitry@HIDDEN> wrote: >=20 >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: >>>=20 >>> Running the command: >>>=20 >>> echo a | grep -E -w '((()|a)|())*' >>>=20 >>> does not terminate, and uses a LOT of processor time, for all = versions of >>> grep I have tried. >>>=20 >>> This is the smallest case that could be found; simplifying anything = in the >>> input and/or expression leads to correct behavior. >>=20 >> Reproducible with GNU grep 3.7 on Ubuntu 22: >>=20 >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ = COMMAND >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 = grep -E -w ((()|a)|())* >>=20 >> It seems that at least on Ubuntu, grep in this mode uses glibc's = regexec(3), and it is that implementation which ends up in an endless = loop. >>=20 >> It loops between lines 1415, 1417 and 1443, but idx and cur_node = never change from 0: >>=20 >> 1378 static reg_errcode_t >> 1379 __attribute_warn_unused_result__ >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, = size_t nmatch, >> 1381 regmatch_t *pmatch, bool fl_backtrack) >> 1382 { >> ... >> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >> 1416 { >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, = nmatch); >> 1418 >> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >> ... >> 1442 /* Proceed to next node. */ >> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, >> 1444 &idx, cur_node, >> 1445 &eps_via_nodes, fs); >> 1446 >> 1447 if (__glibc_unlikely (cur_node < 0)) >> ... >> 1465 } >> 1466 } >>=20 >> -Dimitry >>=20 >> P.S.: Interestingly this does not reproduce with BSD grep, which = returns immediately with "a". >>=20 --Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.2 iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZCb7YQAKCRCwXqMKLiCW o+dZAJoC/FUUV4nkFDYncYF4RZt9TDj6FACfZpp9LHb3L1pn/aWcAXlxCWJKnU0= =R/1i -----END PGP SIGNATURE----- --Apple-Mail=_3EB2C97A-5F5F-4754-9624-FDC38F6B0BB2--
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Jim Meyering <jim@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 04:16:02 +0000 Resent-Message-ID: <handler.62483.B62483.168040892927060 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Koen Claessen <koen@HIDDEN> Cc: 62483 <at> debbugs.gnu.org Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168040892927060 (code B ref 62483); Sun, 02 Apr 2023 04:16:02 +0000 Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 04:15:29 +0000 Received: from localhost ([127.0.0.1]:38815 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pip7p-00072N-2s for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:29 -0400 Received: from mail-lj1-f177.google.com ([209.85.208.177]:41554) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1pip7m-000728-2d for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 00:15:27 -0400 Received: by mail-lj1-f177.google.com with SMTP id bx10so8705350ljb.8 for <62483 <at> debbugs.gnu.org>; Sat, 01 Apr 2023 21:15:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680408920; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DodkkdU3opFDpkTR7I38Cd8sH5Ml6OzzLr47sEPUB+w=; b=xz/uM78Xeo+KZkaGtVa1P9mQjfLV1aivcei2WEfZNoOFMTOkBwpKonRG/ba1JbNr83 pYR+P1Z7IOyoWVpnhev0bwAe6bG4XFDHsU+cW10ImcfNNcmiK1ArWvyofz13j+DNKQ5n /vWdkbrQkfucGebgx/sLK7Jdh8OA0tZt1gbZNJnPjeTPcopGYPDCigA2gGQoEBVawv7g 4x6kAG7avDpCkKtc8U9mnvqk+UI472MyheOtQNwoE3M/v2Hv8m9zsDYT6bHBxxGdDYle 4lcLUHMuIgzGnfkj3/hevOpflvPGZxjPyKTr8WEjYK3ZapFlvI9VQTWpnbJK5hO6Xz77 hAxw== X-Gm-Message-State: AAQBX9e7GYXZxDzQFlrGuxweB9jVGPGIK9j3o02zJXm7N/ETVFXjGl10 mYjRz0gfA7+RtfFFOJlo5APyBjtu4o0gOz8KPHQ= X-Google-Smtp-Source: AKy350ZblXwqXc5nXQDfPymdw4nPR2IN4imJcGePAqD1vZp9OXVu/8aqbvMVAAgh/JkXmi0+y0PDQvbteeLWQRYc1Jg= X-Received: by 2002:a2e:95d8:0:b0:29a:9053:ed1b with SMTP id y24-20020a2e95d8000000b0029a9053ed1bmr9738568ljh.3.1680408919857; Sat, 01 Apr 2023 21:15:19 -0700 (PDT) MIME-Version: 1.0 References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> In-Reply-To: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> From: Jim Meyering <jim@HIDDEN> Date: Sat, 1 Apr 2023 21:15:07 -0700 Message-ID: <CA+8g5KHc9W4dBJufWv96wd53ZE4fcYxrynuUEh9vNR-_1LFYbQ@HIDDEN> Content-Type: multipart/mixed; boundary="000000000000bfb5b205f852b0a6" X-Spam-Score: 0.2 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.8 (/) --000000000000bfb5b205f852b0a6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 27, 2023 at 6:15=E2=80=AFAM Koen Claessen <koen@HIDDEN> wr= ote: > Running the command: > > echo a | grep -E -w '((()|a)|())*' > > does not terminate, and uses a LOT of processor time, for all versions of > grep I have tried. > > This is the smallest case that could be found; simplifying anything in th= e > input and/or expression leads to correct behavior. Thank you! How did you find that? FYI, this strikes grep-3.10 (on Fedora 37/glibc-2.36-9.fc37.x86_64) when using LC_ALL=3Den_US.UTF-8, but not with LC_ALL=3DC. I.e., this infloops: echo a | LC_ALL=3Den_US.UTF-8 grep -E -w '((()|a)|())*' but this works as expected and promptly prints its line of input: echo a | LC_ALL=3DC grep -E -w '((()|a)|())*' For now, I've added an expected-failing test case for this bug: --000000000000bfb5b205f852b0a6 Content-Type: application/octet-stream; name="grep-glibc-infloop.patch" Content-Disposition: attachment; filename="grep-glibc-infloop.patch" Content-Transfer-Encoding: base64 Content-ID: <f_lfyw0lqb0> X-Attachment-Id: f_lfyw0lqb0 RnJvbSBlNDQ4ZTg2YjVjMDFjMGE5OGIxNmU4NGQ2MWVkZTc5NDAxNzBiYjViIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog U2F0LCAxIEFwciAyMDIzIDIxOjAxOjAxIC0wNzAwClN1YmplY3Q6IFtQQVRDSF0gdGVzdHM6IGFk ZCBhIGtub3duLWZhaWxpbmcgZ2xpYmMtaW5mbG9vcCB0ZXN0CgoqIHRlc3RzL2dsaWJjLWluZmxv b3A6IE5ldyBmaWxlLgpCYXNlZCBvbiB0aGUgY29tbWFuZCBmcm9tIEtvZW4gQ2xhZXNzZW4KcmVw b3J0ZWQgaW4gaHR0cHM6Ly9idWdzLmdudS5vcmcvNjI0ODMKKiB0ZXN0cy9NYWtlZmlsZS5hbSAo VEVTVFMpOiBBZGQgdGhlIGZpbGUgbmFtZQoqIFRIQU5LUy5pbjogQWRkIG5hbWUgb2YgcmVwb3J0 ZXIuCi0tLQogVEhBTktTLmluICAgICAgICAgICB8ICAxICsKIHRlc3RzL01ha2VmaWxlLmFtICAg fCAgMyArKysKIHRlc3RzL2dsaWJjLWluZmxvb3AgfCAxMyArKysrKysrKysrKysrCiAzIGZpbGVz IGNoYW5nZWQsIDE3IGluc2VydGlvbnMoKykKIGNyZWF0ZSBtb2RlIDEwMDc1NSB0ZXN0cy9nbGli Yy1pbmZsb29wCgpkaWZmIC0tZ2l0IGEvVEhBTktTLmluIGIvVEhBTktTLmluCmluZGV4IGZkMzYx OTIuLjA2ZjhjNTQgMTAwNjQ0Ci0tLSBhL1RIQU5LUy5pbgorKysgYi9USEFOS1MuaW4KQEAgLTU2 LDYgKzU2LDcgQEAgS2FybCBQZXR0ZXJzc29uICAgICAgICAgICAgICAgICAgICAga2FybC5wZXR0 ZXJzc29uQGtscG4uc2UKIEthdmVoIFIuIEdoYXppICAgICAgICAgICAgICAgICAgICAgIGdoYXpp QGNhaXAucnV0Z2Vycy5lZHUKIEthenVybyBGdXJ1a2F3YSAgICAgICAgICAgICAgICAgICAgIGZ1 cnVrYXdhQGFwcmljb3Qua2VrLmpwCiBLZWl0aCBCb3N0aWMgICAgICAgICAgICAgICAgICAgICAg ICBib3N0aWNAYnNkaS5jb20KK0tvZW4gQ2xhZXNzZW4gICAgICAgICAgICAgICAgICAgICAgIGtv ZW5AY2hhbG1lcnMuc2UKIEtyaXNobmEgU2V0aHVyYW1hbiAgICAgICAgICAgICAgICAgIGtyaXNo bmFAc2dpaHViLmNvcnAuc2dpLmNvbQogS3VydCBEIFNjaHdlaHIgICAgICAgICAgICAgICAgICAg ICAga2RzY2h3ZWhAaW5zY2kxNC51Y3NkLmVkdQogTHVkb3ZpYyBDb3VydMOocyAgICAgICAgICAg ICAgICAgICAgIGx1ZG9AZ251Lm9yZwpkaWZmIC0tZ2l0IGEvdGVzdHMvTWFrZWZpbGUuYW0gYi90 ZXN0cy9NYWtlZmlsZS5hbQppbmRleCBmMTk1YzhkLi4wYzNhZGJlIDEwMDY0NAotLS0gYS90ZXN0 cy9NYWtlZmlsZS5hbQorKysgYi90ZXN0cy9NYWtlZmlsZS5hbQpAQCAtNDksNiArNDksOCBAQCBM REFERCA9IC4uL2xpYi9saWJncmVwdXRpbHMuYSAkKExJQklOVEwpIC4uL2xpYi9saWJncmVwdXRp bHMuYQogIyBGSVhNRS0yMDE1OiBSZW1vdmUgdGhpcyBvbmNlIHRoZSBnbGliYyBhbmQgZ251bGli IGJ1Z3MgYXJlIGZpeGVkLgogWEZBSUxfVEVTVFMgPSB0cmlwbGUtYmFja3JlZgoKK1hGQUlMX1RF U1RTICs9IGdsaWJjLWluZmxvb3AKKwogIyBFcXVpdmFsZW5jZSBjbGFzc2VzIGFyZSBvbmx5IHN1 cHBvcnRlZCB3aGVuIHVzaW5nIHRoZSBzeXN0ZW0KICMgbWF0Y2hlciAod2hpY2ggbWVhbnMgb25s eSB3aXRoIGdsaWJjKS4KICMgVGhlIGluY2x1ZGVkIG1hdGNoZXIgbmVlZHMgdG8gYmUgZml4ZWQu CkBAIC0xMDgsNiArMTEwLDcgQEAgVEVTVFMgPQkJCQkJCVwKICAgZmlsbGJ1Zi1sb25nLWxpbmUJ CQkJXAogICBmbWJ0ZXN0CQkJCQlcCiAgIGZvYWQxCQkJCQkJXAorICBnbGliYy1pbmZsb29wCQkJ CQlcCiAgIGdyZXAtZGV2LW51bGwJCQkJCVwKICAgZ3JlcC1kZXYtbnVsbC1vdXQJCQkJXAogICBn cmVwLWRpcgkJCQkJXApkaWZmIC0tZ2l0IGEvdGVzdHMvZ2xpYmMtaW5mbG9vcCBiL3Rlc3RzL2ds aWJjLWluZmxvb3AKbmV3IGZpbGUgbW9kZSAxMDA3NTUKaW5kZXggMDAwMDAwMC4uNjI0NGI2Ygot LS0gL2Rldi9udWxsCisrKyBiL3Rlc3RzL2dsaWJjLWluZmxvb3AKQEAgLTAsMCArMSwxMyBAQAor IyEvYmluL3NoCisjIFRoaXMgd291bGQgaW5mbG9vcCB3aGVuIHVzaW5nIGdsaWJjJ3MgcmVnZXgg YXQgbGVhc3QgdW50aWwgZ2xpYmMtMi4zNi4KKy4gIiR7c3JjZGlyPS59L2luaXQuc2giOyBwYXRo X3ByZXBlbmRfIC4uL3NyYworCityZXF1aXJlX3RpbWVvdXRfCityZXF1aXJlX2VuX3V0ZjhfbG9j YWxlXworCitmYWlsPTAKKworZWNobyBhID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCityZXR1 cm5zXyAwIHRpbWVvdXQgMiBlbnYgTENfQUxMPWVuX1VTLlVURi04IGdyZXAgLUUgLXcgJygoKCl8 YSl8KCkpKicgaW4gfHwgZmFpbD0xCisKK0V4aXQgJGZhaWwKLS0gCjIuNDAuMC4xNjYuZzE0MGI5 NDc4ZGEKCg== --000000000000bfb5b205f852b0a6--
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: arnold@HIDDEN Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 06:53:01 +0000 Resent-Message-ID: <handler.62483.B.168041835113395 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: dimitry@HIDDEN, arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.168041835113395 (code B ref -1); Sun, 02 Apr 2023 06:53:01 +0000 Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 06:52:31 +0000 Received: from localhost ([127.0.0.1]:39035 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pirZm-0003Tz-R9 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:31 -0400 Received: from lists.gnu.org ([209.51.188.17]:50340) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1pirZk-0003To-83 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZh-0008FE-Kz for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400 Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1pirZf-0006yO-Kg for bug-grep@HIDDEN; Sun, 02 Apr 2023 02:52:25 -0400 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 2 Apr 2023 00:52:04 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466; Sun, 2 Apr 2023 00:52:02 -0600 From: arnold@HIDDEN Message-Id: <202304020652.3326q2Ut024466@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Sun, 02 Apr 2023 00:52:02 -0600 References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN; helo=freefriends.org X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001, KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Hi. Dimitry Andric <dimitry@HIDDEN> wrote: > Yes, it still reproduces when I configure the latest grep using > --without-included-regex: I assume you meant --with-included-regex? If you really used --without-included-regex, that doesn't prove anything... :-) It's interesting, as gawk uses the same regex, but with different flags. It might be worth trying to understand which of the syntax bits is causing this. Thanks, Arnold > > 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > 1: idx = 0 > (gdb) > 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > 1: idx = 0 > (gdb) > 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > 1: idx = 0 > (gdb) > 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > 1: idx = 0 > (gdb) > 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > 1: idx = 0 > (gdb) > 1432 if (__glibc_unlikely (cur_node < 0)) > 1: idx = 0 > (gdb) > 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > 1: idx = 0 > (gdb) > 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > 1: idx = 0 > (gdb) > 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > 1: idx = 0 > (gdb) > 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > 1: idx = 0 > > The endless loop looks the same. grep's regexec.c is mostly the same as > glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N > directives added. > > -Dimitry > > > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: > > > > This does not reproduce with gawk, even when I force use of the regex > > matcher. > > > > What happens if grep is built with the included regex files instead of > > relying on the ones in the local glibc? > > > > Arnold > > > > Dimitry Andric <dimitry@HIDDEN> wrote: > > > >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: > >>> > >>> Running the command: > >>> > >>> echo a | grep -E -w '((()|a)|())*' > >>> > >>> does not terminate, and uses a LOT of processor time, for all versions of > >>> grep I have tried. > >>> > >>> This is the smallest case that could be found; simplifying anything in the > >>> input and/or expression leads to correct behavior. > >> > >> Reproducible with GNU grep 3.7 on Ubuntu 22: > >> > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* > >> > >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. > >> > >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: > >> > >> 1378 static reg_errcode_t > >> 1379 __attribute_warn_unused_result__ > >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, > >> 1381 regmatch_t *pmatch, bool fl_backtrack) > >> 1382 { > >> ... > >> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >> 1416 { > >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >> 1418 > >> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >> ... > >> 1442 /* Proceed to next node. */ > >> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > >> 1444 &idx, cur_node, > >> 1445 &eps_via_nodes, fs); > >> 1446 > >> 1447 if (__glibc_unlikely (cur_node < 0)) > >> ... > >> 1465 } > >> 1466 } > >> > >> -Dimitry > >> > >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a". > >> >
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: arnold@HIDDEN Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 06:53:02 +0000 Resent-Message-ID: <handler.62483.B62483.168041834213373 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: dimitry@HIDDEN, arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168041834213373 (code B ref 62483); Sun, 02 Apr 2023 06:53:02 +0000 Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 06:52:22 +0000 Received: from localhost ([127.0.0.1]:39032 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pirZe-0003Td-3X for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:22 -0400 Received: from frenzy.freefriends.org ([198.99.81.75]:57092 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1pirZb-0003TU-Kl for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 02:52:20 -0400 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 3326q3Cs024467 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 2 Apr 2023 00:52:04 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 3326q2Ut024466; Sun, 2 Apr 2023 00:52:02 -0600 From: arnold@HIDDEN Message-Id: <202304020652.3326q2Ut024466@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Sun, 02 Apr 2023 00:52:02 -0600 References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> In-Reply-To: <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) Hi. Dimitry Andric <dimitry@HIDDEN> wrote: > Yes, it still reproduces when I configure the latest grep using > --without-included-regex: I assume you meant --with-included-regex? If you really used --without-included-regex, that doesn't prove anything... :-) It's interesting, as gawk uses the same regex, but with different flags. It might be worth trying to understand which of the syntax bits is causing this. Thanks, Arnold > > 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > 1: idx = 0 > (gdb) > 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > 1: idx = 0 > (gdb) > 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > 1: idx = 0 > (gdb) > 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > 1: idx = 0 > (gdb) > 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > 1: idx = 0 > (gdb) > 1432 if (__glibc_unlikely (cur_node < 0)) > 1: idx = 0 > (gdb) > 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > 1: idx = 0 > (gdb) > 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > 1: idx = 0 > (gdb) > 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > 1: idx = 0 > (gdb) > 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > 1: idx = 0 > > The endless loop looks the same. grep's regexec.c is mostly the same as > glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N > directives added. > > -Dimitry > > > On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: > > > > This does not reproduce with gawk, even when I force use of the regex > > matcher. > > > > What happens if grep is built with the included regex files instead of > > relying on the ones in the local glibc? > > > > Arnold > > > > Dimitry Andric <dimitry@HIDDEN> wrote: > > > >> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: > >>> > >>> Running the command: > >>> > >>> echo a | grep -E -w '((()|a)|())*' > >>> > >>> does not terminate, and uses a LOT of processor time, for all versions of > >>> grep I have tried. > >>> > >>> This is the smallest case that could be found; simplifying anything in the > >>> input and/or expression leads to correct behavior. > >> > >> Reproducible with GNU grep 3.7 on Ubuntu 22: > >> > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* > >> > >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. > >> > >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: > >> > >> 1378 static reg_errcode_t > >> 1379 __attribute_warn_unused_result__ > >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, > >> 1381 regmatch_t *pmatch, bool fl_backtrack) > >> 1382 { > >> ... > >> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >> 1416 { > >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >> 1418 > >> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >> ... > >> 1442 /* Proceed to next node. */ > >> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > >> 1444 &idx, cur_node, > >> 1445 &eps_via_nodes, fs); > >> 1446 > >> 1447 if (__glibc_unlikely (cur_node < 0)) > >> ... > >> 1465 } > >> 1466 } > >> > >> -Dimitry > >> > >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a". > >> >
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Dimitry Andric <dimitry@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 09:34:02 +0000 Resent-Message-ID: <handler.62483.B62483.168042803431388 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168042803431388 (code B ref 62483); Sun, 02 Apr 2023 09:34:02 +0000 Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 09:33:54 +0000 Received: from localhost ([127.0.0.1]:39224 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1piu5x-0008AC-Vd for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:54 -0400 Received: from tensor.andric.com ([87.251.56.140]:53462 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dimitry@HIDDEN>) id 1piu5u-0008A0-Gk for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:33:52 -0400 Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235; Sun, 2 Apr 2023 11:33:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904; t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=; h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From; b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr OWfWv7UbnXYdg== Content-Type: multipart/signed; boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\)) From: Dimitry Andric <dimitry@HIDDEN> In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN> Date: Sun, 2 Apr 2023 11:33:42 +0200 Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN> References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> <202304020652.3326q2Ut024466@HIDDEN> X-Mailer: Apple Mail (2.3731.500.231) X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Ah sorry, I did indeed rebuild grep with --with-included-regex, and for debugging purposes added CFLAGS=3D"-O0 -g". In any case, the problematic code is both in glibc and grep, as I believe these are originating from the same source. -Dimitry > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote: >=20 > Hi. >=20 > Dimitry Andric <dimitry@HIDDEN> wrote: >=20 >> Yes, it still reproduces when I configure the latest grep using >> --without-included-regex: >=20 > I assume you meant --with-included-regex? >=20 > If you really used --without-included-regex, that doesn't prove = anything... :-) >=20 > It's interesting, as gawk uses the same regex, but with different = flags. > It might be worth trying to understand which of the syntax bits > is causing this. >=20 > Thanks, >=20 > Arnold >=20 >>=20 >> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >> 1: idx =3D 0 >> (gdb) >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, = idx, nmatch); >> 1: idx =3D 0 >> (gdb) >> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >> 1: idx =3D 0 >> (gdb) >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >> 1: idx =3D 0 >> (gdb) >> 1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, >> 1: idx =3D 0 >> (gdb) >> 1432 if (__glibc_unlikely (cur_node < 0)) >> 1: idx =3D 0 >> (gdb) >> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >> 1: idx =3D 0 >> (gdb) >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, = idx, nmatch); >> 1: idx =3D 0 >> (gdb) >> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >> 1: idx =3D 0 >> (gdb) >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >> 1: idx =3D 0 >>=20 >> The endless loop looks the same. grep's regexec.c is mostly the same = as >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N >> directives added. >>=20 >> -Dimitry >>=20 >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: >>>=20 >>> This does not reproduce with gawk, even when I force use of the = regex >>> matcher. >>>=20 >>> What happens if grep is built with the included regex files instead = of >>> relying on the ones in the local glibc? >>>=20 >>> Arnold >>>=20 >>> Dimitry Andric <dimitry@HIDDEN> wrote: >>>=20 >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: >>>>>=20 >>>>> Running the command: >>>>>=20 >>>>> echo a | grep -E -w '((()|a)|())*' >>>>>=20 >>>>> does not terminate, and uses a LOT of processor time, for all = versions of >>>>> grep I have tried. >>>>>=20 >>>>> This is the smallest case that could be found; simplifying = anything in the >>>>> input and/or expression leads to correct behavior. >>>>=20 >>>> Reproducible with GNU grep 3.7 on Ubuntu 22: >>>>=20 >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM = TIME+ COMMAND >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 = 0:08.32 grep -E -w ((()|a)|())* >>>>=20 >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's = regexec(3), and it is that implementation which ends up in an endless = loop. >>>>=20 >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node = never change from 0: >>>>=20 >>>> 1378 static reg_errcode_t >>>> 1379 __attribute_warn_unused_result__ >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t = *mctx, size_t nmatch, >>>> 1381 regmatch_t *pmatch, bool fl_backtrack) >>>> 1382 { >>>> ... >>>> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >>>> 1416 { >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, = idx, nmatch); >>>> 1418 >>>> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >>>> ... >>>> 1442 /* Proceed to next node. */ >>>> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, >>>> 1444 &idx, cur_node, >>>> 1445 &eps_via_nodes, fs); >>>> 1446 >>>> 1447 if (__glibc_unlikely (cur_node < 0)) >>>> ... >>>> 1465 } >>>> 1466 } >>>>=20 >>>> -Dimitry >>>>=20 >>>> P.S.: Interestingly this does not reproduce with BSD grep, which = returns immediately with "a". >>>>=20 >>=20 --Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.2 iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA= =F/9f -----END PGP SIGNATURE----- --Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Dimitry Andric <dimitry@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 09:34:02 +0000 Resent-Message-ID: <handler.62483.B.168042804231417 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, Koen Claessen <koen@HIDDEN>, bug-grep@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.168042804231417 (code B ref -1); Sun, 02 Apr 2023 09:34:02 +0000 Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 09:34:02 +0000 Received: from localhost ([127.0.0.1]:39227 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1piu65-0008AY-Hx for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:02 -0400 Received: from lists.gnu.org ([209.51.188.17]:45166) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dimitry@HIDDEN>) id 1piu64-0008AQ-1g for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 05:34:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>) id 1piu62-0002Ka-6N for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:59 -0400 Received: from tensor.andric.com ([87.251.56.140]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <dimitry@HIDDEN>) id 1piu5y-00074R-0e for bug-grep@HIDDEN; Sun, 02 Apr 2023 05:33:55 -0400 Received: from smtpclient.apple (longrow.home.andric.com [192.168.0.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 6ACDF6C235; Sun, 2 Apr 2023 11:33:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=andric.com; s=201904; t=1680428028; bh=j++piMyInCb64AqGnHXFa/agOcQZYX5hFJCloNa82XE=; h=Subject:From:In-Reply-To:Date:Cc:Message-Id:References:To:From; b=OvpoxsYZyxsgQkhRl/xlrsYK62rT5hKW4sMbC0u3aDCYe7gYbalcM0EcbVcuZmsEp Ajq9eUTN6tzSLEXBOVFlEwFzoS/gkuLXqmKFqU4TKOJpsjfoXca+iNS3RN7w7k6HAP FyHBj1TZpHtlFZQW6GWjX1LTGv9+R+XmXi7cSlUAHzDwjONBsFO40E29XbZDoqHYdD xzUcUJSN1hpy7an+xzhBje/vvne3qpIEbReGf/Aab0z0j/KCjM6GT81+SKIog+el7H vLVb8tU69x6+y1CyoGlsfZ0IY17+aUdgLalblC2o7G4l8UObQXMmN1jOZ7QLq9QPgr OWfWv7UbnXYdg== Content-Type: multipart/signed; boundary="Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\)) From: Dimitry Andric <dimitry@HIDDEN> In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN> Date: Sun, 2 Apr 2023 11:33:42 +0200 Message-Id: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN> References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> <202304020652.3326q2Ut024466@HIDDEN> X-Mailer: Apple Mail (2.3731.500.231) Received-SPF: pass client-ip=87.251.56.140; envelope-from=dimitry@HIDDEN; helo=tensor.andric.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.4 (--) --Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Ah sorry, I did indeed rebuild grep with --with-included-regex, and for debugging purposes added CFLAGS=3D"-O0 -g". In any case, the problematic code is both in glibc and grep, as I believe these are originating from the same source. -Dimitry > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote: >=20 > Hi. >=20 > Dimitry Andric <dimitry@HIDDEN> wrote: >=20 >> Yes, it still reproduces when I configure the latest grep using >> --without-included-regex: >=20 > I assume you meant --with-included-regex? >=20 > If you really used --without-included-regex, that doesn't prove = anything... :-) >=20 > It's interesting, as gawk uses the same regex, but with different = flags. > It might be worth trying to understand which of the syntax bits > is causing this. >=20 > Thanks, >=20 > Arnold >=20 >>=20 >> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >> 1: idx =3D 0 >> (gdb) >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, = idx, nmatch); >> 1: idx =3D 0 >> (gdb) >> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >> 1: idx =3D 0 >> (gdb) >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >> 1: idx =3D 0 >> (gdb) >> 1428 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, >> 1: idx =3D 0 >> (gdb) >> 1432 if (__glibc_unlikely (cur_node < 0)) >> 1: idx =3D 0 >> (gdb) >> 1400 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >> 1: idx =3D 0 >> (gdb) >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, = idx, nmatch); >> 1: idx =3D 0 >> (gdb) >> 1404 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >> 1: idx =3D 0 >> (gdb) >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >> 1: idx =3D 0 >>=20 >> The endless loop looks the same. grep's regexec.c is mostly the same = as >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N >> directives added. >>=20 >> -Dimitry >>=20 >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: >>>=20 >>> This does not reproduce with gawk, even when I force use of the = regex >>> matcher. >>>=20 >>> What happens if grep is built with the included regex files instead = of >>> relying on the ones in the local glibc? >>>=20 >>> Arnold >>>=20 >>> Dimitry Andric <dimitry@HIDDEN> wrote: >>>=20 >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: >>>>>=20 >>>>> Running the command: >>>>>=20 >>>>> echo a | grep -E -w '((()|a)|())*' >>>>>=20 >>>>> does not terminate, and uses a LOT of processor time, for all = versions of >>>>> grep I have tried. >>>>>=20 >>>>> This is the smallest case that could be found; simplifying = anything in the >>>>> input and/or expression leads to correct behavior. >>>>=20 >>>> Reproducible with GNU grep 3.7 on Ubuntu 22: >>>>=20 >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM = TIME+ COMMAND >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 = 0:08.32 grep -E -w ((()|a)|())* >>>>=20 >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's = regexec(3), and it is that implementation which ends up in an endless = loop. >>>>=20 >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node = never change from 0: >>>>=20 >>>> 1378 static reg_errcode_t >>>> 1379 __attribute_warn_unused_result__ >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t = *mctx, size_t nmatch, >>>> 1381 regmatch_t *pmatch, bool fl_backtrack) >>>> 1382 { >>>> ... >>>> 1415 for (idx =3D pmatch[0].rm_so; idx <=3D pmatch[0].rm_eo ;) >>>> 1416 { >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, = idx, nmatch); >>>> 1418 >>>> 1419 if ((idx =3D=3D pmatch[0].rm_eo && cur_node =3D=3D = mctx->last_node) >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, = cur_node))) >>>> ... >>>> 1442 /* Proceed to next node. */ >>>> 1443 cur_node =3D proceed_next_node (mctx, nmatch, pmatch, = prev_idx_match, >>>> 1444 &idx, cur_node, >>>> 1445 &eps_via_nodes, fs); >>>> 1446 >>>> 1447 if (__glibc_unlikely (cur_node < 0)) >>>> ... >>>> 1465 } >>>> 1466 } >>>>=20 >>>> -Dimitry >>>>=20 >>>> P.S.: Interestingly this does not reproduce with BSD grep, which = returns immediately with "a". >>>>=20 >>=20 --Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.2 iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZClL9gAKCRCwXqMKLiCW o0KTAKCiIhcduUQaGUuiJfykNQNnR6+9UACg7EEpahAec8VtdrBL/qhtw0BjprA= =F/9f -----END PGP SIGNATURE----- --Apple-Mail=_C7CE2C4A-4BF8-4D9F-ABFD-9F42FF447F13--
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: arnold@HIDDEN Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 10:05:01 +0000 Resent-Message-ID: <handler.62483.B.16804298582592 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: dimitry@HIDDEN, arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.16804298582592 (code B ref -1); Sun, 02 Apr 2023 10:05:01 +0000 Received: (at submit) by debbugs.gnu.org; 2 Apr 2023 10:04:18 +0000 Received: from localhost ([127.0.0.1]:39240 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1piuZN-0000fk-QN for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:18 -0400 Received: from lists.gnu.org ([209.51.188.17]:39736) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1piuZL-0000fb-ID for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZK-000719-JP for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400 Received: from frenzy.freefriends.org ([198.99.81.75] helo=freefriends.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <arnold@HIDDEN>) id 1piuZI-0002di-Og for bug-grep@HIDDEN; Sun, 02 Apr 2023 06:04:14 -0400 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 2 Apr 2023 04:03:57 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103; Sun, 2 Apr 2023 04:03:56 -0600 From: arnold@HIDDEN Message-Id: <202304021003.332A3uGT023103@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Sun, 02 Apr 2023 04:03:56 -0600 References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> <202304020652.3326q2Ut024466@HIDDEN> <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN> In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Received-SPF: none client-ip=198.99.81.75; envelope-from=arnold@HIDDEN; helo=freefriends.org X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=0.001, KHOP_HELO_FCRDNS=0.399, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) OK, thanks! Dimitry Andric <dimitry@HIDDEN> wrote: > Ah sorry, I did indeed rebuild grep with --with-included-regex, and for > debugging purposes added CFLAGS="-O0 -g". > > In any case, the problematic code is both in glibc and grep, as I > believe these are originating from the same source. > > -Dimitry > > > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote: > > > > Hi. > > > > Dimitry Andric <dimitry@HIDDEN> wrote: > > > >> Yes, it still reproduces when I configure the latest grep using > >> --without-included-regex: > > > > I assume you meant --with-included-regex? > > > > If you really used --without-included-regex, that doesn't prove anything... :-) > > > > It's interesting, as gawk uses the same regex, but with different flags. > > It might be worth trying to understand which of the syntax bits > > is causing this. > > > > Thanks, > > > > Arnold > > > >> > >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >> 1: idx = 0 > >> (gdb) > >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >> 1: idx = 0 > >> (gdb) > >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >> 1: idx = 0 > >> (gdb) > >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >> 1: idx = 0 > >> (gdb) > >> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > >> 1: idx = 0 > >> (gdb) > >> 1432 if (__glibc_unlikely (cur_node < 0)) > >> 1: idx = 0 > >> (gdb) > >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >> 1: idx = 0 > >> (gdb) > >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >> 1: idx = 0 > >> (gdb) > >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >> 1: idx = 0 > >> (gdb) > >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >> 1: idx = 0 > >> > >> The endless loop looks the same. grep's regexec.c is mostly the same as > >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N > >> directives added. > >> > >> -Dimitry > >> > >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: > >>> > >>> This does not reproduce with gawk, even when I force use of the regex > >>> matcher. > >>> > >>> What happens if grep is built with the included regex files instead of > >>> relying on the ones in the local glibc? > >>> > >>> Arnold > >>> > >>> Dimitry Andric <dimitry@HIDDEN> wrote: > >>> > >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: > >>>>> > >>>>> Running the command: > >>>>> > >>>>> echo a | grep -E -w '((()|a)|())*' > >>>>> > >>>>> does not terminate, and uses a LOT of processor time, for all versions of > >>>>> grep I have tried. > >>>>> > >>>>> This is the smallest case that could be found; simplifying anything in the > >>>>> input and/or expression leads to correct behavior. > >>>> > >>>> Reproducible with GNU grep 3.7 on Ubuntu 22: > >>>> > >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* > >>>> > >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. > >>>> > >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: > >>>> > >>>> 1378 static reg_errcode_t > >>>> 1379 __attribute_warn_unused_result__ > >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, > >>>> 1381 regmatch_t *pmatch, bool fl_backtrack) > >>>> 1382 { > >>>> ... > >>>> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >>>> 1416 { > >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >>>> 1418 > >>>> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >>>> ... > >>>> 1442 /* Proceed to next node. */ > >>>> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > >>>> 1444 &idx, cur_node, > >>>> 1445 &eps_via_nodes, fs); > >>>> 1446 > >>>> 1447 if (__glibc_unlikely (cur_node < 0)) > >>>> ... > >>>> 1465 } > >>>> 1466 } > >>>> > >>>> -Dimitry > >>>> > >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a". > >>>> > >> >
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: arnold@HIDDEN Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 10:05:02 +0000 Resent-Message-ID: <handler.62483.B62483.16804298522574 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: dimitry@HIDDEN, arnold@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN X-Debbugs-Original-Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN, bug-grep@HIDDEN Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.16804298522574 (code B ref 62483); Sun, 02 Apr 2023 10:05:02 +0000 Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 10:04:12 +0000 Received: from localhost ([127.0.0.1]:39237 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1piuZI-0000fR-43 for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:12 -0400 Received: from frenzy.freefriends.org ([198.99.81.75]:59844 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <arnold@HIDDEN>) id 1piuZG-0000fJ-Gp for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 06:04:11 -0400 X-Envelope-From: arnold@HIDDEN Received: from freefriends.org (frenzy.freefriends.org [198.99.81.75]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 332A3umP023104 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 2 Apr 2023 04:03:57 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 332A3uGT023103; Sun, 2 Apr 2023 04:03:56 -0600 From: arnold@HIDDEN Message-Id: <202304021003.332A3uGT023103@HIDDEN> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@HIDDEN using -f Date: Sun, 02 Apr 2023 04:03:56 -0600 References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> <202304020652.3326q2Ut024466@HIDDEN> <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN> In-Reply-To: <4DC78C45-1C0C-4A9E-BBFB-6F752AC047F5@HIDDEN> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) OK, thanks! Dimitry Andric <dimitry@HIDDEN> wrote: > Ah sorry, I did indeed rebuild grep with --with-included-regex, and for > debugging purposes added CFLAGS="-O0 -g". > > In any case, the problematic code is both in glibc and grep, as I > believe these are originating from the same source. > > -Dimitry > > > On 2 Apr 2023, at 08:52, arnold@HIDDEN wrote: > > > > Hi. > > > > Dimitry Andric <dimitry@HIDDEN> wrote: > > > >> Yes, it still reproduces when I configure the latest grep using > >> --without-included-regex: > > > > I assume you meant --with-included-regex? > > > > If you really used --without-included-regex, that doesn't prove anything... :-) > > > > It's interesting, as gawk uses the same regex, but with different flags. > > It might be worth trying to understand which of the syntax bits > > is causing this. > > > > Thanks, > > > > Arnold > > > >> > >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >> 1: idx = 0 > >> (gdb) > >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >> 1: idx = 0 > >> (gdb) > >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >> 1: idx = 0 > >> (gdb) > >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >> 1: idx = 0 > >> (gdb) > >> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > >> 1: idx = 0 > >> (gdb) > >> 1432 if (__glibc_unlikely (cur_node < 0)) > >> 1: idx = 0 > >> (gdb) > >> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >> 1: idx = 0 > >> (gdb) > >> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >> 1: idx = 0 > >> (gdb) > >> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >> 1: idx = 0 > >> (gdb) > >> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >> 1: idx = 0 > >> > >> The endless loop looks the same. grep's regexec.c is mostly the same as > >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N > >> directives added. > >> > >> -Dimitry > >> > >>> On 28 Mar 2023, at 08:46, arnold@HIDDEN wrote: > >>> > >>> This does not reproduce with gawk, even when I force use of the regex > >>> matcher. > >>> > >>> What happens if grep is built with the included regex files instead of > >>> relying on the ones in the local glibc? > >>> > >>> Arnold > >>> > >>> Dimitry Andric <dimitry@HIDDEN> wrote: > >>> > >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@HIDDEN> wrote: > >>>>> > >>>>> Running the command: > >>>>> > >>>>> echo a | grep -E -w '((()|a)|())*' > >>>>> > >>>>> does not terminate, and uses a LOT of processor time, for all versions of > >>>>> grep I have tried. > >>>>> > >>>>> This is the smallest case that could be found; simplifying anything in the > >>>>> input and/or expression leads to correct behavior. > >>>> > >>>> Reproducible with GNU grep 3.7 on Ubuntu 22: > >>>> > >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* > >>>> > >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. > >>>> > >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: > >>>> > >>>> 1378 static reg_errcode_t > >>>> 1379 __attribute_warn_unused_result__ > >>>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, > >>>> 1381 regmatch_t *pmatch, bool fl_backtrack) > >>>> 1382 { > >>>> ... > >>>> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) > >>>> 1416 { > >>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); > >>>> 1418 > >>>> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) > >>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) > >>>> ... > >>>> 1442 /* Proceed to next node. */ > >>>> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, > >>>> 1444 &idx, cur_node, > >>>> 1445 &eps_via_nodes, fs); > >>>> 1446 > >>>> 1447 if (__glibc_unlikely (cur_node < 0)) > >>>> ... > >>>> 1465 } > >>>> 1466 } > >>>> > >>>> -Dimitry > >>>> > >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a". > >>>> > >> >
X-Loop: help-debbugs@HIDDEN Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate Resent-From: Paul Eggert <eggert@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-grep@HIDDEN Resent-Date: Sun, 02 Apr 2023 18:30:02 +0000 Resent-Message-ID: <handler.62483.B62483.168046019016876 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 62483 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: arnold@HIDDEN, dimitry@HIDDEN Cc: 62483 <at> debbugs.gnu.org, koen@HIDDEN Received: via spool by 62483-submit <at> debbugs.gnu.org id=B62483.168046019016876 (code B ref 62483); Sun, 02 Apr 2023 18:30:02 +0000 Received: (at 62483) by debbugs.gnu.org; 2 Apr 2023 18:29:50 +0000 Received: from localhost ([127.0.0.1]:42640 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pj2Sc-0004O8-7q for submit <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:50 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:39294) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1pj2SX-0004Ns-KW for 62483 <at> debbugs.gnu.org; Sun, 02 Apr 2023 14:29:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id AEBCE3C09FA02; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id oyIQv3a3PBiJ; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 752483C09FA01; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 752483C09FA01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1680460179; bh=2f3sGrU+5qn62ukHicLKudSRcxmJsffhDbks/B5+0OI=; h=Message-ID:Date:MIME-Version:To:From; b=mnBN4wAUHPlu43V8zbNB9T9pxfGf6HyO7gPzgnJgbfqqBXvW/gp9J17F6/nI3rVGT +rrkdp/1IUPB5eBvCcQVeBn2RpKvqC13iaomLjQ4AVagHIh4UuiftUMSw2cWamRB6W Y2dQr4gZNvJvDgfPnhopgZd8M4Vmb2Ndoa3j7c9n6TEAp74gomzh9DqX43neTH8sVs NlK6CuUZbnfuu3E69cBE1NrvW6XQqr9vz0xfj5qILzwTiilGWONSjPfOer4tt5nzRl ZLb3+e1lUwDk6I/8kVFh3q3CE7Py8BCBNZuXWPyDbQPObf+WY4giKoTmeImHUqXHFh 23i2Aa5w9ntag== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id xT4zfD34fpKg; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id 4A99F3C097AFE; Sun, 2 Apr 2023 11:29:39 -0700 (PDT) Message-ID: <576a26bb-b195-dd33-a5a4-ba38115889d5@HIDDEN> Date: Sun, 2 Apr 2023 11:29:39 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Content-Language: en-US References: <CAHJ_xkTYjONGLYWUL8uhnOVHHsBYEVqZ8WSPbr4nn8RrdfykvA@HIDDEN> <C23CEBBF-8231-46B4-8224-E4FCEED45A7D@HIDDEN> <202303280646.32S6krUr030677@HIDDEN> <EC26823E-6BD3-4EC6-8959-4C5AF60D83FB@HIDDEN> <202304020652.3326q2Ut024466@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department In-Reply-To: <202304020652.3326q2Ut024466@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.1 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) On 2023-04-01 23:52, arnold@HIDDEN wrote: > It's interesting, as gawk uses the same regex, but with different flags. Also, GNU grep -w passes the following more-complicated regexp to dfaparse: (^|[^[:alnum:]_])(((()|a)|())*)([^[:alnum:]_]|$) and quite possibly the bug is related to this extra complexity.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.