X-Loop: help-debbugs@HIDDEN
Subject: bug#78439: Accent insensitive grep
Resent-From: "Avid Seeker" <avidseeker@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Thu, 15 May 2025 07:47:02 +0000
Resent-Message-ID: <handler.78439.B.174729517812630 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 78439
X-GNU-PR-Package: grep
X-GNU-PR-Keywords:
To: 78439 <at> debbugs.gnu.org
X-Debbugs-Original-To: <bug-grep@HIDDEN>
Received: via spool by submit <at> debbugs.gnu.org id=B.174729517812630
(code B ref -1); Thu, 15 May 2025 07:47:02 +0000
Received: (at submit) by debbugs.gnu.org; 15 May 2025 07:46:18 +0000
Received: from localhost ([127.0.0.1]:50611 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1uFTIH-0003HZ-Cw
for submit <at> debbugs.gnu.org; Thu, 15 May 2025 03:46:18 -0400
Received: from lists.gnu.org ([2001:470:142::17]:33014)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <avidseeker@HIDDEN>)
id 1uFRT0-0002jA-Ns
for submit <at> debbugs.gnu.org; Thu, 15 May 2025 01:49:15 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <avidseeker@HIDDEN>)
id 1uFRSu-00034x-R3
for bug-grep@HIDDEN; Thu, 15 May 2025 01:49:08 -0400
Received: from layka.disroot.org ([178.21.23.139])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
(Exim 4.90_1) (envelope-from <avidseeker@HIDDEN>)
id 1uFRSs-0000OJ-OG
for bug-grep@HIDDEN; Thu, 15 May 2025 01:49:08 -0400
Received: from mail01.disroot.lan (localhost [127.0.0.1])
by disroot.org (Postfix) with ESMTP id F12A1252B1
for <bug-grep@HIDDEN>; Thu, 15 May 2025 07:49:02 +0200 (CEST)
X-Virus-Scanned: SPAM Filter at disroot.org
Received: from layka.disroot.org ([127.0.0.1])
by localhost (disroot.org [127.0.0.1]) (amavis, port 10024) with ESMTP
id L0h5YT0v0zFN for <bug-grep@HIDDEN>;
Thu, 15 May 2025 07:49:02 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=disroot.org; s=mail;
t=1747288142; bh=Xu5wWXBoen1LAu/+jswb7ZnfGuJhK88VGbgkOzROvSU=;
h=Date:From:Subject:To;
b=DiGgZXa2lvksGTqDEJDn7G+tMZyxuBiv7pTGu5ljyOyRF6Kcpd1wHcSHAF6g/QVsA
NWKO6nxVlTJnIv9Cjj2Sn09cvaqiTwV8TrbLZjz87voGjFdLRXipgSCxhuB5ZcmFnK
65ixavmohKImfEr/WDMXdcU6TmwqO0GjcriNfWvblBY4cyvq2uGclK/mC7se2JdDo1
rEj30O7YFCY0Mn9oy/hT7CLbSRVJpXHK2NIgfIQ0I5/XuxnYAs0H/+suPy4gLPqTew
CbnpIkhxgpXRDLmjtkgZNZfGGLS6sDonjjejTHctcmNwNvkpaMvOkcu81UppnFiCz3
SfTrDGhgmr9cg==
Content-Type: text/plain; charset=UTF-8; format=Flowed
Date: Thu, 15 May 2025 05:49:00 +0000
Message-Id: <D9WHYA9BBOX7.394N0TBSJEIHJ@HIDDEN>
From: "Avid Seeker" <avidseeker@HIDDEN>
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=178.21.23.139;
envelope-from=avidseeker@HIDDEN; helo=layka.disroot.org
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001,
SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 0.9 (/)
X-Mailman-Approved-At: Thu, 15 May 2025 03:46:11 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.1 (/)
Re-iterating the question on SO <https://stackoverflow.com/questions/209378=
64/> of applying an
accent-insensitive grep to text. (e.g: all accents of a letter 'e' should b=
e regarded as an ascii 'e').
The response by Adam Katz mentions:
> You should not expect equivalence classes to be portable as they are too =
arcane.
What's the stance of grep developers on this? are equivalence classes the
right tool to approach this? I see that they depend on LC_COLLATE, in
which case it would be possible to setup a custom locale that matches
digraphs.
In the example he gave, he also mentions:
> This matches all words like aei... [but won't match] =C3=A6i... it's quit=
e
> likely that digraphs are beyond the reach of even the best equivalence
> class map.
Is there a way to setup a locale without having to recompile glibc or
are these locale values hardcoded into programs using glibc?
Thanks,
Avid
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: "Avid Seeker" <avidseeker@HIDDEN> Subject: bug#78439: Acknowledgement (Accent insensitive grep) Message-ID: <handler.78439.B.174729517812630.ack <at> debbugs.gnu.org> References: <D9WHYA9BBOX7.394N0TBSJEIHJ@HIDDEN> X-Gnu-PR-Message: ack 78439 X-Gnu-PR-Package: grep Reply-To: 78439 <at> debbugs.gnu.org Date: Thu, 15 May 2025 07:47:03 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-grep@HIDDEN If you wish to submit further information on this problem, please send it to 78439 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 78439: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D78439 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN
Subject: bug#78439: Accent insensitive grep
Resent-From: Paul Eggert <eggert@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Thu, 15 May 2025 16:20:04 +0000
Resent-Message-ID: <handler.78439.B78439.174732599319036 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 78439
X-GNU-PR-Package: grep
X-GNU-PR-Keywords:
To: Avid Seeker <avidseeker@HIDDEN>
Cc: 78439 <at> debbugs.gnu.org
Received: via spool by 78439-submit <at> debbugs.gnu.org id=B78439.174732599319036
(code B ref 78439); Thu, 15 May 2025 16:20:04 +0000
Received: (at 78439) by debbugs.gnu.org; 15 May 2025 16:19:53 +0000
Received: from localhost ([127.0.0.1]:55376 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1uFbJI-0004wv-6S
for submit <at> debbugs.gnu.org; Thu, 15 May 2025 12:19:53 -0400
Received: from mail.cs.ucla.edu ([131.179.128.66]:53128)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <eggert@HIDDEN>)
id 1uFbIo-0004vh-1m
for 78439 <at> debbugs.gnu.org; Thu, 15 May 2025 12:19:27 -0400
Received: from localhost (localhost [127.0.0.1])
by mail.cs.ucla.edu (Postfix) with ESMTP id 4FECB3C0140A0;
Thu, 15 May 2025 09:19:15 -0700 (PDT)
Received: from mail.cs.ucla.edu ([127.0.0.1])
by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10032) with ESMTP
id NthAMA7Rzisg; Thu, 15 May 2025 09:19:15 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
by mail.cs.ucla.edu (Postfix) with ESMTP id 287D33C0149C6;
Thu, 15 May 2025 09:19:15 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 287D33C0149C6
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu;
s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1747325955;
bh=8soblP3JaemDQZKum8be67J0ZRPSsL8Qx+A/TMkgNeA=;
h=Message-ID:Date:MIME-Version:To:From;
b=K25149QgUVi4oL2IA2soxE04OokaRDyI0eE0QpJsgiZnWLivjseFqpV3Jt5AzQxCu
U8YoMsZ01PYeVNVLDLCVKyRRJOa5PJlJkf99oKeCHEaHWQDgye55ZeDaT+IhFMmERQ
xwuUC4lDyl2Kaa92QR8FnuTJ3M6V/mucrDWVvPpsldeNX+wwv0EXKZvuvJLDNQ8CP/
yXWCkkhGyS6ZjoLisYmMGHwJc0jqLS2rQJLWcFfaXdFKOHOQIZ2Sssd1Rwg4VVM0lL
AHM684clcPe2FBdjwSbUYmRRPtBP+I8vlIB7F+wv4coLpYAj67E9Cvl8/G8J+538HB
ZVDxedk57qA7A==
X-Virus-Scanned: amavis at mail.cs.ucla.edu
Received: from mail.cs.ucla.edu ([127.0.0.1])
by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10026) with ESMTP
id 9WD6NINoEkt5; Thu, 15 May 2025 09:19:15 -0700 (PDT)
Received: from [192.168.254.12]
(47-147-225-25.fdr01.snmn.ca.ip.frontiernet.net [47.147.225.25])
by mail.cs.ucla.edu (Postfix) with ESMTPSA id 0F7B03C0140A0;
Thu, 15 May 2025 09:19:15 -0700 (PDT)
Message-ID: <36aaec4c-6a7e-4a7c-b9bf-e0ddf2efaa67@HIDDEN>
Date: Thu, 15 May 2025 09:19:14 -0700
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
References: <D9WHYA9BBOX7.394N0TBSJEIHJ@HIDDEN>
Content-Language: en-US
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
In-Reply-To: <D9WHYA9BBOX7.394N0TBSJEIHJ@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
On 2025-05-14 22:49, Avid Seeker via Bug reports for GNU grep wrote:
> are equivalence classes the
> right tool to approach this?
They're supposed to be, yes ...
> I see that they depend on LC_COLLATE, in
> which case it would be possible to setup a custom locale that matches
> digraphs.
... though you're venturing into uncharted territory here. Please let us
know of any monsters you find.
> Is there a way to setup a locale without having to recompile glibc
Yes, use localedef.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.