GNU bug report logs - #60621
grep -P does not set PCRE2_UCP

Previous Next

Package: grep;

Reported by: Karl Pettersson <karl.pettersson <at> klpn.se>

Date: Sat, 7 Jan 2023 07:38:03 UTC

Severity: normal

Merged with 60618

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 60621 in the body.
You can then email your comments to 60621 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#60621; Package grep. (Sat, 07 Jan 2023 07:38:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Karl Pettersson <karl.pettersson <at> klpn.se>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 07 Jan 2023 07:38:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Karl Pettersson <karl.pettersson <at> klpn.se>
To: bug-grep <bug-grep <at> gnu.org>
Subject: grep -P does not set PCRE2_UCP
Date: Fri, 6 Jan 2023 21:41:53 +0100
Hi

Using grep -P for boundary matches yields incorrect results with
non-ASCII letters:

$ echo 'Öst' | grep -P '\bs'
Öst

The output should be nothing in this case, and the culprit seems to be
this line in pcresearch.c:

      flags |= PCRE2_UTF;

If the PCRE2_UCP flag is added according to this, the program behaves
correctly:

      flags |= PCRE2_UTF|PCRE2_UCP;

The pcre2grep test program in the pcre2 has the same problem, and I
filed an issue there too:

https://github.com/PCRE2Project/pcre2/issues/185

A Twitter discussion with more examples:

https://twitter.com/gro_tsen/status/1610972356972875777

Kind regards
-- 
Karl Pettersson
Uppsala, Sverige/Sweden

https://static-dust.klpn.se/




Information forwarded to bug-grep <at> gnu.org:
bug#60621; Package grep. (Sat, 07 Jan 2023 09:15:02 GMT) Full text and rfc822 format available.

Message #8 received at 60621 <at> debbugs.gnu.org (full text, mbox):

From: Karl Pettersson <karl.pettersson <at> klpn.se>
To: bug#60621 <60621 <at> debbugs.gnu.org>
Subject: Duplicate 0f #60618
Date: Sat, 7 Jan 2023 10:14:22 +0100
Hi

I first filed the original issue for pcre2grep after a Twitter
discussion, and then also sent it to the bug-grep list, but Carlo Arenas
had already noticed it (but it had not been registered from what I could
see), so it is a duplicate of #60618.

-- 
Karl Pettersson
Uppsala, Sverige/Sweden

https://static-dust.klpn.se/




Merged 60618 60621. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sat, 07 Jan 2023 22:56:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 05 Feb 2023 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 74 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.