GNU bug report logs - #43225
Grep treats extended Latin characters like whitespace

Previous Next

Package: grep;

Reported by: Mayo Fark <mayofark <at> outlook.com>

Date: Sat, 5 Sep 2020 16:06:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43225 in the body.
You can then email your comments to 43225 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#43225; Package grep. (Sat, 05 Sep 2020 16:06:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mayo Fark <mayofark <at> outlook.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 05 Sep 2020 16:06:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mayo Fark <mayofark <at> outlook.com>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Grep treats extended Latin characters like whitespace
Date: Sat, 5 Sep 2020 14:27:56 +0000
[Message part 1 (text/plain, inline)]
What I did:
```
grep -Riw cone *
'''

Expected result: lines with the word "cone" surrounded by whitespace, ignoring case.

What I got instead:
```
data/po/pt_BR.po:msgstr "Pressione o ícone de pódio para iniciar o tutorial"
'''

Why this is a bug: the word ícone is not the same as cone and should not have been returned in the result set. It appears that grep treats the í character in ícone as whitespace, which affects other extended-Latin characters as well.


[Message part 2 (text/html, inline)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Wed, 09 Sep 2020 19:46:01 GMT) Full text and rfc822 format available.

Notification sent to Mayo Fark <mayofark <at> outlook.com>:
bug acknowledged by developer. (Wed, 09 Sep 2020 19:46:01 GMT) Full text and rfc822 format available.

Message #10 received at 43225-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Mayo Fark <mayofark <at> outlook.com>
Cc: 43225-done <at> debbugs.gnu.org
Subject: Re: bug#43225: Grep treats extended Latin characters like whitespace
Date: Wed, 9 Sep 2020 12:45:11 -0700
[Message part 1 (text/plain, inline)]
On 9/5/20 7:27 AM, Mayo Fark wrote:

> grep -Riw cone *
> ...
> data/po/pt_BR.po:msgstr "Pressione o ícone de pódio para iniciar o tutorial"

Thanks for the bug report. This bug is due to an overenthusiastic optimization 
that I installed in late 2016. I installed the attached patch to fix the bug.
[0001-grep-fix-w-bug-in-UTF-8-locales.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 08 Oct 2020 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 172 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.