GNU bug report logs - #27681
grep: Combining Mark-Nonspacing are classified as [:punct:]

Previous Next

Package: grep;

Reported by: Santiago <santiagorr <at> riseup.net>

Date: Thu, 13 Jul 2017 13:22:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 27681 in the body.
You can then email your comments to 27681 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#27681; Package grep. (Thu, 13 Jul 2017 13:22:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Santiago <santiagorr <at> riseup.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 13 Jul 2017 13:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Santiago <santiagorr <at> riseup.net>
To: bug-grep <at> gnu.org
Cc: 662629-submitter <at> bugs.debian.org
Subject: grep: Combining Mark-Nonspacing are classified as [:punct:]
Date: Thu, 13 Jul 2017 15:21:40 +0200
Hi,

I would like to forward the issue below, reported by Panu Kalliokoskii
in 2012 (better late than never!). I think the correct category is
Mark-nonspacing, but I am not very familiar with Unicode though.

It still occurs in grep 3.1. In this case, using the U+0301 acute accent:

 $ echo árbol | grep -o '[[:alpha:]]*'
 a
 rbol

Cheers,

 -- Santiago

On Mon, 05 Mar 2012 13:08:43 +0200 "Panu A. Kalliokoski" <atehwa <at> sange.fi> wrote:
> Package: grep
> Version: 2.6.3-3
> Severity: normal
> 
> 
> It seems that grep misclassifies combining letters (unicode class Lm) as
> punctuation, when they should be letters.  For instance:
> 
> $ echo d̪ʌ̀lì | grep -o '[[:alpha:]]*'
> d
> ʌ
> li
> 
> As a consequence, combining accents are not seen as "word-constituent":
> 
> $ echo d̪ʌ̀lì | grep -o '\w*'
> d
> ʌ
> li
> 
> This causes also false positives on word-boundary conditions, such as
> the below:
> 
> $ echo d̪ʌ̀lì | grep -w ʌ
> d̪ʌ̀lì
> 
> I suggest that combining letters should be part of [:alpha:] instead of
> [:punct:].




Information forwarded to bug-grep <at> gnu.org:
bug#27681; Package grep. (Thu, 13 Jul 2017 19:04:02 GMT) Full text and rfc822 format available.

Message #8 received at 27681 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Santiago <santiagorr <at> riseup.net>, 27681 <at> debbugs.gnu.org
Cc: 662629-submitter <at> bugs.debian.org
Subject: Re: bug#27681: grep: Combining Mark-Nonspacing are classified as
 [:punct:]
Date: Thu, 13 Jul 2017 12:03:02 -0700
[Message part 1 (text/plain, inline)]
Surely this is a glibc bug, not a grep bug. Grep is just following the 
character classification of glibc. I can reproduce the problem by 
compiling and running the attached program, which uses only glibc (not 
grep). This program exits with status 1, whereas you want it to exit 
with status 0. So I suggest filing a glibc bug report.
[combining.c (text/x-csrc, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#27681; Package grep. (Mon, 17 Jul 2017 09:21:01 GMT) Full text and rfc822 format available.

Message #11 received at 27681 <at> debbugs.gnu.org (full text, mbox):

From: Santiago <santiagorr <at> riseup.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 27681 <at> debbugs.gnu.org
Subject: Re: bug#27681: grep: Combining Mark-Nonspacing are classified as
 [:punct:]
Date: Mon, 17 Jul 2017 11:20:36 +0200
El 13/07/17 a las 12:03, Paul Eggert escribió:
> Surely this is a glibc bug, not a grep bug. Grep is just following the
> character classification of glibc. I can reproduce the problem by compiling
> and running the attached program, which uses only glibc (not grep). This
> program exits with status 1, whereas you want it to exit with status 0. So I
> suggest filing a glibc bug report.

Done. Thanks,

  -- Santiago




bug closed, send any further explanations to 27681 <at> debbugs.gnu.org and Santiago <santiagorr <at> riseup.net> Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Tue, 31 Dec 2019 19:18:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 29 Jan 2020 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 83 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.