GNU bug report logs - #51458
grep PCRE - mean

Previous Next

Package: grep;

Reported by: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>

Date: Thu, 28 Oct 2021 09:11:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51458 in the body.
You can then email your comments to 51458 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#51458; Package grep. (Thu, 28 Oct 2021 09:11:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 28 Oct 2021 09:11:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)"
 <slawomir.skrzyniarz <at> nokia.com>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: grep PCRE - mean
Date: Thu, 28 Oct 2021 08:23:08 +0000
[Message part 1 (text/plain, inline)]
Hello Grep Team,
I would update grep from version 2.20 to 3.1 and noticed that grep with -P option
stops recognize below regular expression:

cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' |
grep -ozPLq '\A(?:\s*^(?:#\w+.*\s*|extern\s+.+)$)*+(?<namespace>\s*namespace(?:\s+ utTestNamespace \s*(?>(?<block>{(?:[^{}]*(?&block)*)*}))|(\s*[\w:]*\s*{)(?&namespace)\s*}))\s*\z'; echo "retcode $?"

Content of file SomeTestFile.cpp:
#include <memory>
#include <vector>
#include <gtest/gtest.h>

namespace utTestNamespace
{
using ::testing::NiceMock;
# some code here
}
//end of file


I checked regular expression on regex101.com webpage and noticed that mentioned regex is working for PCRE and PCRE2 on webpage but stop working in grep 3.1 and later versions (versions between 2.20 and 3.1 were not checked).
See link:
https://regex101.com/r/9NwluI/1/

Investigation shows that grep in 3.1 version and later 3.6 and 3.7 different handle "^" and "$" for "-P" option.
It looks that "^" does not detect all begin of lines but "$" does not recognize all end of lines.

It seems that "^" is treated as beginning of whole test string - not new lines.
"$" is suspected to recognize only end of whole test string - not end of lines.

I would ask you if is intended behavior or it looks like an issue in grep.

useful command in test:
cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' | grep -zP '(?:\s*^(?:\#\w+.*\s*|extern\s+.+)$)*+'
cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' | grep -zP '(?:\s*^(?:\#\w+.*\s*|extern\s+.+)\s*)*+'


Best Regards,
Sławek

[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#51458; Package grep. (Mon, 08 Nov 2021 08:54:01 GMT) Full text and rfc822 format available.

Message #8 received at 51458 <at> debbugs.gnu.org (full text, mbox):

From: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)"
 <slawomir.skrzyniarz <at> nokia.com>
To: "51458 <at> debbugs.gnu.org" <51458 <at> debbugs.gnu.org>
Subject: RE: grep PCRE - "^" and "$" are not recognized as begin and end of
 line for multiline strings
Date: Mon, 8 Nov 2021 07:38:33 +0000
[Message part 1 (text/plain, inline)]
Hello team,
I've fixed subject to better describe of issue: "grep PCRE - "^" and "$" are not recognized as begin and end of line for multiline strings".

Best Regards,
Sławek

From: Skrzyniarz, Slawomir (Nokia - PL/Krakow)
Sent: Thursday, October 28, 2021 10:23 AM
To: bug-grep <at> gnu.org
Subject: grep PCRE - mean

Hello Grep Team,
I would update grep from version 2.20 to 3.1 and noticed that grep with -P option
stops recognize below regular expression:

cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' |
grep -ozPLq '\A(?:\s*^(?:#\w+.*\s*|extern\s+.+)$)*+(?<namespace>\s*namespace(?:\s+ utTestNamespace \s*(?>(?<block>{(?:[^{}]*(?&block)*)*}))|(\s*[\w:]*\s*{)(?&namespace)\s*}))\s*\z'; echo "retcode $?"

Content of file SomeTestFile.cpp:
#include <memory>
#include <vector>
#include <gtest/gtest.h>

namespace utTestNamespace
{
using ::testing::NiceMock;
# some code here
}
//end of file


I checked regular expression on regex101.com webpage and noticed that mentioned regex is working for PCRE and PCRE2 on webpage but stop working in grep 3.1 and later versions (versions between 2.20 and 3.1 were not checked).
See link:
https://regex101.com/r/9NwluI/1/

Investigation shows that grep in 3.1 version and later 3.6 and 3.7 different handle "^" and "$" for "-P" option.
It looks that "^" does not detect all begin of lines but "$" does not recognize all end of lines.

It seems that "^" is treated as beginning of whole test string - not new lines.
"$" is suspected to recognize only end of whole test string - not end of lines.

I would ask you if is intended behavior or it looks like an issue in grep.

useful command in test:
cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' | grep -zP '(?:\s*^(?:\#\w+.*\s*|extern\s+.+)$)*+'
cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' | grep -zP '(?:\s*^(?:\#\w+.*\s*|extern\s+.+)\s*)*+'


Best Regards,
Sławek

[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#51458; Package grep. (Mon, 08 Nov 2021 20:30:02 GMT) Full text and rfc822 format available.

Message #11 received at 51458 <at> debbugs.gnu.org (full text, mbox):

From: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>
To: slawomir.skrzyniarz <at> nokia.com
Cc: 51458 <at> debbugs.gnu.org
Subject: bug#51458: grep PCRE - '^' and '$' are not recognized as begin and
 end of line for multiline strings
Date: Mon, 8 Nov 2021 12:29:19 -0800
older versions of PCRE support in grep used multiline mode by default, which
seems to be required by your expression to work and is also on by default in
the regex site.

you can add it back using an internal option[1] from PCRE, as shown in the
following modified expression from your original example:

/\A(?m:\s*^(?:#\w+.*\s*|extern\s+.+)$)*+(?<namespace>\s*namespace(?:\s+utTestNamespace\s*(?>(?<block>{(?:[^{}]*(?&block)*)*}))|(\s*[\w:]*\s*{)(?&namespace)\s*}))\s*\z/

Carlo

[1] https://www.pcre.org/current/doc/html/pcre2pattern.html#internaloptions




Information forwarded to bug-grep <at> gnu.org:
bug#51458; Package grep. (Tue, 09 Nov 2021 12:07:03 GMT) Full text and rfc822 format available.

Message #14 received at 51458 <at> debbugs.gnu.org (full text, mbox):

From: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)"
 <slawomir.skrzyniarz <at> nokia.com>
To: "51458 <at> debbugs.gnu.org" <51458 <at> debbugs.gnu.org>
Subject: RE: bug#51458: grep PCRE - '^' and '$' are not recognized as begin
 and end of line for multiline strings
Date: Tue, 9 Nov 2021 06:48:56 +0000
Thank you Carlo.
Replacing (?: -> (?m:
Solve my issue.

Thank you,
Sławek

-----Original Message-----
From: Carlo Marcelo Arenas Belón <carenas <at> gmail.com> 
Sent: Monday, November 8, 2021 9:29 PM
To: Skrzyniarz, Slawomir (Nokia - PL/Krakow) <slawomir.skrzyniarz <at> nokia.com>
Cc: 51458 <at> debbugs.gnu.org
Subject: bug#51458: grep PCRE - '^' and '$' are not recognized as begin and end of line for multiline strings

older versions of PCRE support in grep used multiline mode by default, which
seems to be required by your expression to work and is also on by default in
the regex site.

you can add it back using an internal option[1] from PCRE, as shown in the
following modified expression from your original example:

/\A(?m:\s*^(?:#\w+.*\s*|extern\s+.+)$)*+(?<namespace>\s*namespace(?:\s+utTestNamespace\s*(?>(?<block>{(?:[^{}]*(?&block)*)*}))|(\s*[\w:]*\s*{)(?&namespace)\s*}))\s*\z/

Carlo

[1] https://www.pcre.org/current/doc/html/pcre2pattern.html#internaloptions




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 09 Nov 2021 18:06:02 GMT) Full text and rfc822 format available.

Notification sent to "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>:
bug acknowledged by developer. (Tue, 09 Nov 2021 18:06:02 GMT) Full text and rfc822 format available.

Message #19 received at 51458-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>,
 "51458 <at> debbugs.gnu.org" <51458-done <at> debbugs.gnu.org>
Subject: Re: bug#51458: grep PCRE - '^' and '$' are not recognized as begin
 and end of line for multiline strings
Date: Tue, 9 Nov 2021 10:05:31 -0800
On 11/8/21 22:48, Skrzyniarz, Slawomir (Nokia - PL/Krakow) wrote:
> Solve my issue.

Thanks for letting us know; closing the bug report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 08 Dec 2021 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 101 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.