X-Loop: help-debbugs@HIDDEN Subject: bug#57507: Regular expression matching depends on locale encoding Resent-From: Jean Abou Samra <jean@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Wed, 31 Aug 2022 16:55:02 +0000 Resent-Message-ID: <handler.57507.B.16619649018131 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 57507 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 57507 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-guile@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.16619649018131 (code B ref -1); Wed, 31 Aug 2022 16:55:02 +0000 Received: (at submit) by debbugs.gnu.org; 31 Aug 2022 16:55:01 +0000 Received: from localhost ([127.0.0.1]:40359 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1oTQzU-000275-Uc for submit <at> debbugs.gnu.org; Wed, 31 Aug 2022 12:55:01 -0400 Received: from lists.gnu.org ([209.51.188.17]:53616) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <jean@HIDDEN>) id 1oTQzR-00026v-IB for submit <at> debbugs.gnu.org; Wed, 31 Aug 2022 12:54:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48694) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <jean@HIDDEN>) id 1oTQzP-0007ai-IG for bug-guile@HIDDEN; Wed, 31 Aug 2022 12:54:57 -0400 Received: from mout.kundenserver.de ([217.72.192.74]:56449) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <jean@HIDDEN>) id 1oTQzN-0006uL-VS for bug-guile@HIDDEN; Wed, 31 Aug 2022 12:54:55 -0400 Received: from [192.168.1.128] ([82.65.251.18]) by mrelayeu.kundenserver.de (mreue108 [212.227.15.184]) with ESMTPSA (Nemesis) id 1MjSDU-1p8aMz3GIc-00ksss for <bug-guile@HIDDEN>; Wed, 31 Aug 2022 18:54:51 +0200 Message-ID: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> Date: Wed, 31 Aug 2022 18:54:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Content-Language: en-US From: Jean Abou Samra <jean@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:QLmlXCYqcqLP3agz04lc3RXCFoPSB7Tf2Tkiw3gTJk1qzF/XRe/ v0a8Jcm2mMTbQQbBWS5Hhlbgv8jPvoHi78kNSYnsikO3tH5dfPX5RA2WYJb6wCfsS/tR77K XHL3NJwn86NZbrOTScuxTSnD/7aqNnjDXJmZKCNXMD5xoK8b6I8edmKhMUEJgYijQQq7KVA wXTDK1oRzWxAVNe+y0qTQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:FYZex9tiPUk=:OIid1czp9DkRTgEzF3lsdX aOvtsPvp4yoDWxvedLKjkOrlvr95dbi6p0xDYzbJY2GXpmWqYzHJvlLmrGHGcVCMGvIetM17N Qu6Ma9FWT4q/E7mB+upMwfIPI+z5wpArdNjFCVXsC2I1vto/Vy45s/f4S+uQJ9yOA8jhmbXVJ U8mVF1bnLYiCza3kOl5EmiMoSHQep7ZaiyvezfN8UumCjeNBXtMo5fRB7umT6t9GlhnL7POQp 2Vjfe53DDZ4iDu4cUV1Gx8K0smZtVCqIJUrbJOoHiGB7k8GpKoSs236xhGS5gSFlE5M+4a0aH nHh5cQ8DI7yfvoIwDWfjU27ZWFUr1mfGB4rbEe2qojLfWtPYuarVtF0urSg6wAb86EZx6OqrN ug0X14Ux344FRJffFl8RM4AJ0vbonw2VjZUy45VS97lxb7r5HG8/JiUqNdIHaYlH/g/qSOVO9 U8iwxTebZKrVXkgUN15JsTHUcBv+r80RxQVfO1fzb2QzkhX/Cf7H4J71QS+UJpaBKaVXfvxij fPAnnFZMwW/fRAW7PdxlCJYZ8FgOxvpjGy1YWpNwB2yN8KPY3ec1anGniI1sHIye3CwH1n+aa RXdqSLRXhvJHoTfh33bwszsSkqWfEYm3ZqyEy6GkkK0kLKq/l1ilFYq/oyMMVs7gdlkcqvSvf 6HWNibBFxqUo16l6GpSb03Vh9pprO5R9M3hvOWCU2LEj521mINqSUuJdmJLD2gFnF7YxX1l7p 54NWghrQo20N2TR49kd8pM3xT7kUq73X71zsTA== Received-SPF: none client-ip=217.72.192.74; envelope-from=jean@HIDDEN; helo=mout.kundenserver.de X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Regular expressions do funky things with Unicode if a non-Unicode-aware locale is set. Yet, they're purely string operations, so I don't think it's expected that they depend on the locale encoding. $ LC_ALL=C guile3.0 GNU Guile 3.0.7 Copyright (C) 1995-2021 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> (use-modules (ice-9 regex)) scheme@(guile-user)> (match:substring (string-match "\u203f" "\u3091")) ice-9/boot-9.scm:1685:16: In procedure raise-exception: In procedure make-regexp: Invalid preceding regular expression Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue. scheme@(guile-user) [1]> ,q scheme@(guile-user)> (match:substring (string-match "[\u203f]" "\u3091")) $1 = "\u3091" scheme@(guile-user)>
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Jean Abou Samra <jean@HIDDEN> Subject: bug#57507: Acknowledgement (Regular expression matching depends on locale encoding) Message-ID: <handler.57507.B.16619649018131.ack <at> debbugs.gnu.org> References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> X-Gnu-PR-Message: ack 57507 X-Gnu-PR-Package: guile Reply-To: 57507 <at> debbugs.gnu.org Date: Wed, 31 Aug 2022 16:55:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-guile@HIDDEN If you wish to submit further information on this problem, please send it to 57507 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 57507: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D57507 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#57507: Regular expression matching depends on locale encoding References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> In-Reply-To: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> Resent-From: dsmich@HIDDEN Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Thu, 01 Sep 2022 19:35:02 +0000 Resent-Message-ID: <handler.57507.B57507.166206086927161 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 57507 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: "'Jean Abou Samra'" <jean@HIDDEN> Cc: "'57507 <at> debbugs.gnu.org'" <57507 <at> debbugs.gnu.org> Received: via spool by 57507-submit <at> debbugs.gnu.org id=B57507.166206086927161 (code B ref 57507); Thu, 01 Sep 2022 19:35:02 +0000 Received: (at 57507) by debbugs.gnu.org; 1 Sep 2022 19:34:29 +0000 Received: from localhost ([127.0.0.1]:44227 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1oTpxM-000740-GU for submit <at> debbugs.gnu.org; Thu, 01 Sep 2022 15:34:28 -0400 Received: from p-impout006aa.msg.pkvw.co.charter.net ([47.43.26.137]:52035 helo=p-impout006.msg.pkvw.co.charter.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <dsmich@HIDDEN>) id 1oTpxL-00073m-1K for 57507 <at> debbugs.gnu.org; Thu, 01 Sep 2022 15:34:27 -0400 Received: from localhost ([34.233.51.36]) by cmsmtp with ESMTP id TpxBoNsNkc2JfTpxBolfNc; Thu, 01 Sep 2022 19:34:18 +0000 X-Authority-Analysis: v=2.4 cv=SORR6cjH c=1 sm=1 tr=0 ts=6311093a a=TrnfHZhGi+cGSPqA0dbxTQ==:117 a=TrnfHZhGi+cGSPqA0dbxTQ==:17 a=KTtA7ReM4oAA:10 a=mDV3o1hIAAAA:8 a=LP8mQn3rpFOBcqUV9yAA:9 a=QEXdDO2ut3YA:10 a=p4KCnXIM3wF440YW14QA:9 a=AvTSyNsJAD-45a5L:21 a=_W_S_7VecoQA:10 a=_FVE-zBwftR9WsbkzFJk:22 Message-Id: <58cf2a302a753608ba9b978ebace5f13ef0fae70@webmail> From: dsmich@HIDDEN X-Mailer: Atmail X-Originating-IP: [63.87.53.154] X-Priority: 3 Importance: Normal X-MSMail-Priority: Normal Date: Thu, 01 Sep 2022 19:34:17 +0000 Content-Type: multipart/alternative; boundary="=_cf6e2bbb45a9e4a4389f62f4f3ba8a83" MIME-Version: 1.0 X-CMAE-Envelope: MS4xfN4CZhZLLZRVnlzOA4W0ELRl/2zMtitN1T+1OK1AEVjiP3xWVyPGKANl76N1+8aAeceyjy6Z03fro81h8OCOYsqxvtiDBGQcVfGARvBQX9kCBoUjZ1ff kPKkmRfVVvvz6S6No+W8PrTU3dXYipt0mTvhlYNBssImpgtrDyNlnF03T952aurKopR1MBh7EJpT+In0T6KVKtRNVKXQA8oQQEc= X-Spam-Score: 1.8 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Also remember that Guile uses the system C library regex routines. And is using C strings, not Guile strings. (sorry for top post, too tired to fight with this web editor) -Dale Content analysis details: (1.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HTML_MESSAGE BODY: HTML included in message 1.8 MISSING_MIMEOLE Message has X-MSMail-Priority, but no X-MimeOLE X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.8 (/) --=_cf6e2bbb45a9e4a4389f62f4f3ba8a83 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =0AAlso remember that Guile uses the system C library regex routines. An= d=0Ais using C strings, not Guile strings.=0A=0A(sorry for top post, too= tired to fight with this web editor)=0A=0A-Dale=0A=0A=09---------------= --------------------------From: "Jean Abou Samra" =0ATo: 57507@HIDDEN= nu.org=0ACc: =0ASent: Wednesday August 31 2022 12:55:13PM=0ASubject: bug= #57507: Regular expression matching depends on locale=0Aencoding=0A=0A R= egular expressions do funky things with Unicode if a=0Anon-Unicode-aware= =0A locale is set. Yet, they're purely string operations, so I don't=0At= hink=0A it's expected that they depend on the locale encoding.=0A=0A $ L= C_ALL=3DC guile3.0=0A GNU Guile 3.0.7=0A Copyright (C) 1995-2021 Free So= ftware Foundation, Inc.=0A=0A Guile comes with ABSOLUTELY NO WARRANTY; f= or details type `,show w'.=0A This program is free software, and you are= welcome to redistribute it=0A under certain conditions; type `,show c'= for details.=0A=0A Enter `,help' for help.=0A scheme@(guile-user)> (use= -modules (ice-9 regex))=0A scheme@(guile-user)> (match:substring (string= -match "u203f" "u3091"))=0A ice-9/boot-9.scm:1685:16: In procedure raise= -exception:=0A In procedure make-regexp: Invalid preceding regular expre= ssion=0A=0A Entering a new prompt. Type `,bt' for a backtrace or `,q' to= =0Acontinue.=0A scheme@(guile-user) [1]> ,q=0A scheme@(guile-user)> (mat= ch:substring (string-match "[u203f]"=0A"u3091"))=0A $1 =3D "u3091"=0A sc= heme@(guile-user)>=0A=0A --=_cf6e2bbb45a9e4a4389f62f4f3ba8a83 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html><body><br>Also remember that Guile uses the system C library regex= routines. And is using C strings, not Guile strings.<br><br>(sorr= y for top post, too tired to fight with this web editor)<br><br>-Dale<br= ><div class=3D"reply-new-signature"></div><p>---------------------------= --------------</p>From: "Jean Abou Samra" <jean@HIDDEN><br>To: 57= 507 <at> debbugs.gnu.org<br>Cc: <br>Sent: Wednesday August 31 2022 12:55:13PM= <br>Subject: bug#57507: Regular expression matching depends on locale en= coding<br><br>=0ARegular expressions do funky things with Unicode if a= =0Anon-Unicode-aware<br>=0Alocale is set. Yet, they're purely string ope= rations, so I don't=0Athink<br>=0Ait's expected that they depend on the= locale encoding.<br><br><br><br>=0A$ LC_ALL=3DC guile3.0<br>=0AGNU Guil= e 3.0.7<br>=0ACopyright (C) 1995-2021 Free Software Foundation, Inc.<br>= <br>=0AGuile comes with ABSOLUTELY NO WARRANTY; for details type `,show= =0Aw'.<br>=0AThis program is free software, and you are welcome to redis= tribute=0Ait<br>=0Aunder certain conditions; type `,show c' for details.= <br><br>=0AEnter `,help' for help.<br>=0Ascheme@(guile-user)> (use-mo= dules (ice-9 regex))<br>=0Ascheme@(guile-user)> (match:substring (str= ing-match "\u203f"=0A"\u3091"))<br>=0Aice-9/boot-9.scm:1685:16: In proce= dure raise-exception:<br>=0AIn procedure make-regexp: Invalid preceding= regular expression<br><br>=0AEntering a new prompt. Type `,bt' fo= r a backtrace or `,q' to=0Acontinue.<br>=0Ascheme@(guile-user) [1]> ,= q<br>=0Ascheme@(guile-user)> (match:substring (string-match "[\u203f]= "=0A"\u3091"))<br>=0A$1 =3D "\u3091"<br>=0Ascheme@(guile-user)><br><b= r><br><br><br></jean@HIDDEN></body></html> --=_cf6e2bbb45a9e4a4389f62f4f3ba8a83--
X-Loop: help-debbugs@HIDDEN Subject: bug#57507: Regular expression matching depends on locale encoding Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 05 Sep 2022 07:49:01 +0000 Resent-Message-ID: <handler.57507.B57507.16623641278251 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 57507 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Jean Abou Samra <jean@HIDDEN> Cc: 57507 <at> debbugs.gnu.org Received: via spool by 57507-submit <at> debbugs.gnu.org id=B57507.16623641278251 (code B ref 57507); Mon, 05 Sep 2022 07:49:01 +0000 Received: (at 57507) by debbugs.gnu.org; 5 Sep 2022 07:48:47 +0000 Received: from localhost ([127.0.0.1]:46343 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1oV6qd-000291-Iw for submit <at> debbugs.gnu.org; Mon, 05 Sep 2022 03:48:47 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40328) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <ludo@HIDDEN>) id 1oV6qb-00028m-52 for 57507 <at> debbugs.gnu.org; Mon, 05 Sep 2022 03:48:46 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:41818) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1oV6qV-00074l-H6; Mon, 05 Sep 2022 03:48:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=yjLZpPpqJjrzvrwGwTJso5nzx7vIJSnh4hcje0lCoyg=; b=el5seU5/+mmjrJa+VYWn BaR1Cib8BUPnNJuzRHN+UdY2gv0L3OHo9kB2948dU9Nbz3EGsxjrQVi7TH1TVkFHRnceCJPZApB81 bjcs6FXJgOJ2knj101zxr73hjnipfKVsrrYj0/CrUvlbZpeW0ITEzYH7l8OWzh+mMU9drO7Ghd4R/ xYclP2/0tY81R2t5EwTTH3WKGBT5rYMSXDQqb1UgXyTDusHR9r8B5nl8XQYv0p/J2046YZfTGXb4i 4U6KpRsp89wVK/HY/rFgexnqqzpT6Y9VyWchhp0xUYo7dlvMbqegACeHdZ9nwZIKk8fCl2Ug6+XPa uTLK8eUPK7fgoA==; Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=39144 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1oV6qV-0008FW-1s; Mon, 05 Sep 2022 03:48:39 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@HIDDEN> References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> Date: Mon, 05 Sep 2022 09:48:36 +0200 In-Reply-To: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> (Jean Abou Samra's message of "Wed, 31 Aug 2022 18:54:50 +0200") Message-ID: <87mtbe5kiz.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Hi Jean, Jean Abou Samra <jean@HIDDEN> skribis: > Regular expressions do funky things with Unicode if a non-Unicode-aware > locale is set. Yet, they're purely string operations, so I don't think > it's expected that they depend on the locale encoding. This is the expected behavior: first because (ice-9 regex) is implemented in terms of the libc regex functions, as Dale put (but that could be thought as an implementation detail), and second because things such as character classes are necessarily locale-dependent (this has bitten us in the past, for instance with <https://bugs.gnu.org/35785>). I hope that makes sense. Thanks, Ludo=E2=80=99.
X-Loop: help-debbugs@HIDDEN Subject: bug#57507: Regular expression matching depends on locale encoding Resent-From: Jean Abou Samra <jean@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 05 Sep 2022 18:40:01 +0000 Resent-Message-ID: <handler.57507.B57507.166240317526592 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 57507 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@HIDDEN> Cc: 57507 <at> debbugs.gnu.org Received: via spool by 57507-submit <at> debbugs.gnu.org id=B57507.166240317526592 (code B ref 57507); Mon, 05 Sep 2022 18:40:01 +0000 Received: (at 57507) by debbugs.gnu.org; 5 Sep 2022 18:39:35 +0000 Received: from localhost ([127.0.0.1]:48818 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1oVH0R-0006uq-EE for submit <at> debbugs.gnu.org; Mon, 05 Sep 2022 14:39:35 -0400 Received: from mout.kundenserver.de ([212.227.126.187]:59007) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <jean@HIDDEN>) id 1oVH0P-0006ub-4q for 57507 <at> debbugs.gnu.org; Mon, 05 Sep 2022 14:39:33 -0400 Received: from [10.245.88.236] ([46.193.67.184]) by mrelayeu.kundenserver.de (mreue011 [212.227.15.168]) with ESMTPSA (Nemesis) id 1MQusJ-1oh35V3L4Z-00NwqR; Mon, 05 Sep 2022 20:39:26 +0200 Message-ID: <a9cd5f37-caef-0554-ad5a-a5f1f7fd7919@HIDDEN> Date: Mon, 5 Sep 2022 20:39:26 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Content-Language: en-US References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> <87mtbe5kiz.fsf@HIDDEN> From: Jean Abou Samra <jean@HIDDEN> In-Reply-To: <87mtbe5kiz.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:D7jOS8t7cIu6Snmzinv5ARhqfi0iqZuprxnEmUmyHMpmcwztqQ2 0n2I8QWwnKrK1c7uJPXm+B6TF3xgoDg0AZUu6d5tOypTqn/S6Rv8NApBDc4D84MM03hO6l5 DlTKUB2gSuBOe7GPXRZJ7i8+rsDCS8ewN0y4H5KaieQZqHbxPsjxIE2yRsLssgbQ213yDQp AsAFh9FkO4nTEU0T7HV1w== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:7V3t1vMziqU=:jxhteOXgjK+KvVaZnSFcQg dnfSVXiQmuoORBiwpftYgYWwJo9vzs4dKXrHUnwRO1UvDRBbd8G9bb87nQKYJo3KnVghzlbvX mSUnvYACxuwUTKMrcpbTrat0jjUwUhj6C3A70umqa+zuYTc9lLO/lMQs6vr3rpz41iliz5C3u iSD42RovdNuFOJBoiVd2JHtCkvsnO4bQECUJMrDgWwkVYxAvOqAR7LmQ35Hpis7IRpWdL7oZU AxcnHhlUBxYmOpx6fPnkdXvSJuVn+0GDVdfIuOUogd0udHHSLgdTaYRtB4m/45fwQPrjKubjG aHhLGlWTFPcD8RRtkM2oEeoZVg6o3f0T7lXjfpKBakYf3F+dHeMvgZYioI82tdqmuUw919ziS Ns3tWDEizpcP4vKoLugdmT+ihh0o6dEbVNI+jcar/dUHIHi27H4i3ZM1mSUo/wLSoiYuKcQYY jWDztufFefaj3ozdWdmHbykxkVCx8RuhfNOj4ROKGJcCfQLZTGJtvfKLZxb7wPyl6rTiHtCve bpkes2Yqf4LIJxa3/SWOXlApEmzonN4q/GTDQGshRMXnRjsUIIaMXcp7VQrau7+k291fu2dIn snkdhZvfSI3bhz3sUykp9fKcXpGW9qPt4PxSGgBBOSgxLINRa68BOchpeJe/Dg9lQeZyVyni6 SMT/GIe4JaP9zOfx6d+FNx5gXT1SxMjHG+7QjG5MIMYojcbUGStoPKFK6j/vo05YCsQD4Svpu iViGms3cTvDQLwD3PLrGr/TusxWy5iGi3qKr0yAExAl10haTmSNoXvrbqvdBckgChR9hXnK96 s/ZpYnq X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Le 05/09/2022 à 09:48, Ludovic Courtès a écrit : > Hi Jean, > > Jean Abou Samra <jean@HIDDEN> skribis: > >> Regular expressions do funky things with Unicode if a non-Unicode-aware >> locale is set. Yet, they're purely string operations, so I don't think >> it's expected that they depend on the locale encoding. > This is the expected behavior: first because (ice-9 regex) is > implemented in terms of the libc regex functions, as Dale put (but that > could be thought as an implementation detail), and second because things > such as character classes are necessarily locale-dependent (this has > bitten us in the past, for instance with <https://bugs.gnu.org/35785>). > > I hope that makes sense. OK, thanks, but in this case, it should be clearly stated as a limitation in the (ice-9 regex) documentation IMHO. If you don't know what constraints there are on the implementation, there is no reason to expect this. Would it help if I submitted a patch for that?
X-Loop: help-debbugs@HIDDEN Subject: bug#57507: Regular expression matching depends on locale encoding Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 05 Sep 2022 19:25:01 +0000 Resent-Message-ID: <handler.57507.B57507.166240588615748 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 57507 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Jean Abou Samra <jean@HIDDEN> Cc: 57507 <at> debbugs.gnu.org Received: via spool by 57507-submit <at> debbugs.gnu.org id=B57507.166240588615748 (code B ref 57507); Mon, 05 Sep 2022 19:25:01 +0000 Received: (at 57507) by debbugs.gnu.org; 5 Sep 2022 19:24:46 +0000 Received: from localhost ([127.0.0.1]:48946 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1oVHi9-00045v-Rr for submit <at> debbugs.gnu.org; Mon, 05 Sep 2022 15:24:46 -0400 Received: from eggs.gnu.org ([209.51.188.92]:44866) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <ludo@HIDDEN>) id 1oVHi7-00045f-1p for 57507 <at> debbugs.gnu.org; Mon, 05 Sep 2022 15:24:44 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:57624) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1oVHi1-0003uz-C8; Mon, 05 Sep 2022 15:24:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=RbpwUQgKWzA/hVtVBJHICbWsT/2uZuYTzJLs2NkCGP4=; b=E9u0m6a+043ylH+qVxh0 ZBpwOSKTyKorwD5KBZ5VsaZwPp9EFTqe4eM7egcniTFD9qKfpmBNeOLXOldyCGnI0LhY6l9Wovj2G +YJefSHTGupW0RnYkp5+TMfTaTa4IhGIVGanApC3KzeOnWCgIjxKz/a4eKIihkWlxA6TjTojnXRQ+ 1OEVtTS3qI/0t2GeNdNXiYasSIdCLV1+/kCOueB8EzgspK5V75DnT74UIOueuNhJC7ZT9ugAQgphd N/Sj7CBHBaLydP6JbPiDw7yUySP2RAZVL+jQl7GGX5Rv0QRX2OZfgKm1pbVyMCfl1jwhRNxCkFLwV uGwWvz1Frs3Q1w==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:50487 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1oVHi0-0005iq-Vw; Mon, 05 Sep 2022 15:24:37 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@HIDDEN> References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> <87mtbe5kiz.fsf@HIDDEN> <a9cd5f37-caef-0554-ad5a-a5f1f7fd7919@HIDDEN> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Nonidi 19 Fructidor an 230 de la =?UTF-8?Q?R=C3=A9volution,?= jour du Tagette X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 05 Sep 2022 21:24:35 +0200 In-Reply-To: <a9cd5f37-caef-0554-ad5a-a5f1f7fd7919@HIDDEN> (Jean Abou Samra's message of "Mon, 5 Sep 2022 20:39:26 +0200") Message-ID: <87czc939qk.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Hi, Jean Abou Samra <jean@HIDDEN> skribis: > Le 05/09/2022 =C3=A0 09:48, Ludovic Court=C3=A8s a =C3=A9crit=C2=A0: >> Hi Jean, >> >> Jean Abou Samra <jean@HIDDEN> skribis: >> >>> Regular expressions do funky things with Unicode if a non-Unicode-aware >>> locale is set. Yet, they're purely string operations, so I don't think >>> it's expected that they depend on the locale encoding. >> This is the expected behavior: first because (ice-9 regex) is >> implemented in terms of the libc regex functions, as Dale put (but that >> could be thought as an implementation detail), and second because things >> such as character classes are necessarily locale-dependent (this has >> bitten us in the past, for instance with <https://bugs.gnu.org/35785>). >> >> I hope that makes sense. > > > > OK, thanks, but in this case, it should be clearly stated as a limitation > in the (ice-9 regex) documentation IMHO. If you don't know what constrain= ts > there are on the implementation, there is no reason to expect this. Would= it > help if I submitted a patch for that? Yes, that=E2=80=99d be welcome. I would not call it a constraint or limita= tion; for example, that =E2=80=98w=E2=80=99 is not a letter in Swedish is the kin= d of thing you=E2=80=99d generally want to take into account. Now, it=E2=80=99d be ni= ce if one could easily specify the locale to operate under, with an API similar to that of (ice-9 i18n) and its first-class locale objects. Thanks, Ludo=E2=80=99.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.