Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 55331) by debbugs.gnu.org; 9 May 2022 18:50:01 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 09 14:50:01 2022 Received: from localhost ([127.0.0.1]:59446 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1no8SH-0002SF-1w for submit <at> debbugs.gnu.org; Mon, 09 May 2022 14:50:01 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:53653) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <benson_muite@HIDDEN>) id 1no8Mt-0002FO-0l for 55331 <at> debbugs.gnu.org; Mon, 09 May 2022 14:44:27 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 7C50E5C01CA; Mon, 9 May 2022 14:44:21 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 09 May 2022 14:44:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emailplus.org; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1652121861; x= 1652208261; bh=6XjTX5eXv33RjH5AkypQh4kfaiXo2P4TXZq0EqLUp/A=; b=j yxcvX0X9FcDjfqoBwow/jI8FH2jwj7fe6W+CU4F0X7tQf1S0+SGqdCALujEd4UZV ccKNvWsqCJYvOUEQIUezpsX1IuZyItpQVsjdavjmmtPTAIveocQefBgcQlbLis/U RIbX97354JXJTpvWeQaLXg6pTmE8UjVkCrs9ZY0t9g4x8rVD8WInYKfuXuBX0kmp ip4PSfwT4qgO1ovTyGj8KHhTquMWwc9dgo6Ke0eSFEH7HsqT+qIM8yQIQ4yWhNGm 7lSQuHy/iKxUpZAU3IfG7sClK0ylanpeZ+7KxRN4rgKCyXcf6BeSCz5epSp70Wr5 JTup67vgZDAzhtswTJRCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1652121861; x=1652208261; bh=6XjTX5eXv33Rj H5AkypQh4kfaiXo2P4TXZq0EqLUp/A=; b=VeDK/sY9emwp52I8YOUCG1L1OXgX1 reUwOgWQ06dR5zva2PwGSLXygXCr5jfS/lrhgsdmmQcsPL5VNpJvijiy8b1Ekr/j ew2G8YT2NJm8yPBbBQoWqwcOWY88SVq7lwxwlObZ0tS2ONp6EE/dkdv0WRA4BQaM /Ji5spBGsNzqqg9pk2f120GoW+u0Rj2GicLmbWRjyWc9yimT/0POjc6+WmsF4ABH BQ5H6iH7zODRiUD0oqjd6vKtyQh976VSN75I45v0vI8+8t4BCIc8sx+qR5VKqZGN 0ex/G9cDfT3ErtpstLxY50WkOICULttxTTSnRuXJQPjoERtMRb5K2q/hg== X-ME-Sender: <xms:BWF5YpjFtUvl8MvEYYjOYipJLeNN2vXakI6aJmL9SywG1gyba_aluA> <xme:BWF5YuDNAcyRH99OajBSukmsGuB08Q2nRdup-dUnGSFC51Z46hgT7ds32S1v_OsO_ sQOawaroBpVcV-c> X-ME-Received: <xmr:BWF5YpFAnH1nsmOJlQH3a_VRDQeZpOxuNC0LaFuDxvPPVK_267PZMFxr72A79nc> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeelgdduvdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtfeejnecuhfhrohhmpeeuvghn shhonhcuofhuihhtvgcuoegsvghnshhonhgpmhhuihhtvgesvghmrghilhhplhhushdroh hrgheqnecuggftrfgrthhtvghrnhepveetledtueellefhgeduvddtgfejgeduveeviedu veevleejleekgedugeeuuefhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepsggvnhhsohhnpghmuhhithgvsegvmhgrihhlphhluhhsrdhorhhg X-ME-Proxy: <xmx:BWF5YuQvGWaNQBoY5T7uo1afZUgn96kGxvPSvOaERLYppahpfv7sDw> <xmx:BWF5YmwANDQxNflQWIo69STxmUd-1oOw8IXy3ngrBuItWy4tkBlKWQ> <xmx:BWF5Yk7S-344CfFrlwc-sLCe4Q6SLi4q3s4TEo0zwXuWslEWNS_kMQ> <xmx:BWF5YsZ-gMlEz7jhwo0dI1xVz7KzZx3khtrExETiHLd-ro5eLtesBA> Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 9 May 2022 14:44:20 -0400 (EDT) Message-ID: <86421642-9579-a9bb-8ef0-61c9cfcbee8f@HIDDEN> Date: Mon, 9 May 2022 21:44:17 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: bug#55331: Improved support for combining diacritics Content-Language: en-US To: Paul Eggert <eggert@HIDDEN> References: <55709462-5ea6-ff90-a0bc-5c919cb1af47@HIDDEN> <85688b8d-04ff-bcfa-814a-a8415d9df291@HIDDEN> From: Benson Muite <benson_muite@HIDDEN> In-Reply-To: <85688b8d-04ff-bcfa-814a-a8415d9df291@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 55331 X-Mailman-Approved-At: Mon, 09 May 2022 14:50:00 -0400 Cc: 55331 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.7 (-) On 5/9/22 21:30, Paul Eggert wrote: > On 5/8/22 23:38, Benson Muite wrote: > > It might be nice for 'grep' to have ways to perform Unicode > normalization before matching. In the meantime perhaps you can get what > you want by normalizing the text before running it through 'grep'. Thanks for the advice. uconv should work.
bug-grep@HIDDEN
:bug#55331
; Package grep
.
Full text available.Received: (at 55331) by debbugs.gnu.org; 9 May 2022 18:30:38 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 09 14:30:38 2022 Received: from localhost ([127.0.0.1]:59422 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1no89W-0000z0-00 for submit <at> debbugs.gnu.org; Mon, 09 May 2022 14:30:38 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:39560) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1no89T-0000yM-Vk for 55331 <at> debbugs.gnu.org; Mon, 09 May 2022 14:30:36 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B18511600D1; Mon, 9 May 2022 11:30:29 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id sC7awXmK3iUh; Mon, 9 May 2022 11:30:29 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 10E371600D4; Mon, 9 May 2022 11:30:29 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id V71HQyVjOWhQ; Mon, 9 May 2022 11:30:28 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E039B1600D1; Mon, 9 May 2022 11:30:28 -0700 (PDT) Message-ID: <85688b8d-04ff-bcfa-814a-a8415d9df291@HIDDEN> Date: Mon, 9 May 2022 11:30:28 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: bug#55331: Improved support for combining diacritics Content-Language: en-US To: Benson Muite <benson_muite@HIDDEN> References: <55709462-5ea6-ff90-a0bc-5c919cb1af47@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department In-Reply-To: <55709462-5ea6-ff90-a0bc-5c919cb1af47@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 55331 Cc: 55331 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) On 5/8/22 23:38, Benson Muite wrote: > When using >=20 > grep -E "\s[a-z\`\'a=CC=84a=CC=81a=CC=80e=CC=84e=CC=81e=CC=80i=CC=84i=CC= =81i=CC=80i=CC=A3i=CC=A3=CC=84i=CC=A3=CC=81i=CC=A3=CC=80o=CC=84o=CC=81o=CC= =80=E1=BB=8D=E1=BB=8D=CC=84=E1=BB=8D=E1=BB=8D=CC=81=E1=BB=8D=CC=80u=CC=84= u=CC=81u=CC=80u=CC=A3=CC=84=E1=BB=A5=CC=81=E1=BB=A5=CC=80n=CC=84n=CC=81n=CC= =80m=CC=84m=CC=81m=CC=80]{4}$" >=20 > to extract 4 letter Igbo words The {4} means "4 characters", not "4 letters", and a combining character=20 counts as a character. It might be nice for 'grep' to have ways to perform Unicode=20 normalization before matching. In the meantime perhaps you can get what=20 you want by normalizing the text before running it through 'grep'.
bug-grep@HIDDEN
:bug#55331
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 9 May 2022 07:03:39 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 09 03:03:39 2022 Received: from localhost ([127.0.0.1]:55821 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1nnxQh-0004cH-0T for submit <at> debbugs.gnu.org; Mon, 09 May 2022 03:03:39 -0400 Received: from lists.gnu.org ([209.51.188.17]:37352) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <benson_muite@HIDDEN>) id 1nnx4o-0001qh-GK for submit <at> debbugs.gnu.org; Mon, 09 May 2022 02:41:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52216) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <benson_muite@HIDDEN>) id 1nnx4j-0004OV-DR for bug-grep@HIDDEN; Mon, 09 May 2022 02:41:00 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:58163) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <benson_muite@HIDDEN>) id 1nnx4h-0001bU-K2 for bug-grep@HIDDEN; Mon, 09 May 2022 02:40:57 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id BC7B0320098A for <bug-grep@HIDDEN>; Mon, 9 May 2022 02:40:50 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Mon, 09 May 2022 02:40:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emailplus.org; h=cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to; s=fm1; t=1652078450; x=1652164850; bh=rdRoNk/s8j lcreROUMZZpZPSUiYA59biJNQsbVLXhyo=; b=mVLcOIkVCWEiM8+6tGU2219dr1 7iLNBdu7VHFSRC7IHFI4LHnz/EFHK6cm7R90DWPter9+rt4IbZvubaZzDHqUS0ak In4dhzhXGDzPIsPLSjM/qCO3aTnbl4Yy1lxob3516MQ/Skjg2Bhv4UbtkWWdpzL1 uNR43Y4xbVZ5vvuCvxrc5kC4mzN6jwFdl+GiozEiq6LAlKZMGkk9VEKkujh7knd+ +gNUhtvmoeRolRODB72+tEcKWFwt+PtgL5Xfa0y5FWR8MopdKWTCTjei+/bf2fUT SZgn1a+CuPBdrWGIPi/jed1D1GA4AiqFvDIiqUnwOwzjBhvJEj7+Op840uSQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:from:from:in-reply-to:message-id:mime-version :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1652078450; x= 1652164850; bh=rdRoNk/s8jlcreROUMZZpZPSUiYA59biJNQsbVLXhyo=; b=G IcvJW0IeLrT0UWYf3DxWV2piNMwIqsOEKSZLcE0GJ2BWfvJd+UnDPslMlRDOACy1 SJsfoQ0gH5RF+mIHZXwNCRK1HObZUB9RlZfsVTmugHZDsWnUCW1ZxSQdkN6SXhfY ByxRiaW56vIQbnw6rZY0wcAIoRGFOlAcxDswrDf8rflgArMJpMIjDSf/affn/0T+ uTtoI1MV0xbI1dqq4CdNqBaXCxmDG3j3Vpx9Yp9ZCVclc1eiNTasrOiATjsYf9M5 ET03RHOknr5/fTULfFp2ndtdgBLfVVPQBacBk1fAQQZQRLdVCKO9YRXwA/rfvWWU iRlWu+yqLtgqqu+4P49Mg== X-ME-Sender: <xms:crd4YkXk5rhhnO2JHmpK1bMW6ADg0C0McoYXYAlyYyxHwf6VFWV50w> <xme:crd4YolL97ilRyRju96CIQWpzajnLiYsrPLlhe9PBMQ5wX_-cdCok8hBgW7tO-doR SKEAr5SkcivVkDd> X-ME-Received: <xmr:crd4YoaX1FZ3aTBjHUsG0C7T-9xK96Mxcac3cBwPcnyMSJeMc1ztdGSf2jDDm2jSRac> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeekgdduudduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefkffggfgfhuffvtgfgsehtkeertd dtfeejnecuhfhrohhmpeeuvghnshhonhcuofhuihhtvgcuoegsvghnshhonhgpmhhuihht vgesvghmrghilhhplhhushdrohhrgheqnecuggftrfgrthhtvghrnhepgefhfeehleejie elkeefleeghfehfeelhfdthefhieefvefftdegudehfffhhfehnecuvehluhhsthgvrhfu ihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsggvnhhsohhnpghmuhhithgvse gvmhgrihhlphhluhhsrdhorhhg X-ME-Proxy: <xmx:crd4YjVUEiGPA6OZMtQJqdyIaA8X_9CCjkUOiMGlrxmNzaOG8hf3lg> <xmx:crd4Yuk8lV8n8I4bmoFG99XNqFLPe5K2zQydR6UCNty6sRzML0fTkw> <xmx:crd4Yofh9nwBK6S1BF-QtxVY7-_dgq8a7LNwE5En4x5UvCXx9ZLEaA> <xmx:crd4YoQGXOHKy68ffazoFm8tbXH-AsGGZxuW20sYl9cz26d4SQG57g> Received: by mail.messagingengine.com (Postfix) with ESMTPA for <bug-grep@HIDDEN>; Mon, 9 May 2022 02:40:41 -0400 (EDT) Message-ID: <55709462-5ea6-ff90-a0bc-5c919cb1af47@HIDDEN> Date: Mon, 9 May 2022 09:38:26 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Content-Language: en-US From: Benson Muite <benson_muite@HIDDEN> Subject: Improved support for combining diacritics To: bug-grep@HIDDEN Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=64.147.123.19; envelope-from=benson_muite@HIDDEN; helo=wout3-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.7 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 09 May 2022 03:03:37 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.7 (--) Hi, Unicode allows for combining diacritics. When using grep -E "\s[a-z\`\'āáàēéèīíìịị̄ị́ị̀ōóòọọ̄ọọ́ọ̀ūúùụ̄ụ́ụ̀n̄ńǹm̄ḿm̀]{4}$" to extract 4 letter Igbo words from a text, akụ̀ is incorrectly classified as a 4 letter word, when it is a three letter word. Would a patch to fix this be accepted? Regards, Benson Muite
Benson Muite <benson_muite@HIDDEN>
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#55331
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.