GNU bug report logs - #57507
Regular expression matching depends on locale encoding

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guile; Reported by: Jean Abou Samra <jean@HIDDEN>; dated Wed, 31 Aug 2022 16:55:02 UTC; Maintainer for guile is bug-guile@HIDDEN.

Message received at 57507 <at> debbugs.gnu.org:


Received: (at 57507) by debbugs.gnu.org; 5 Sep 2022 19:24:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Sep 05 15:24:46 2022
Received: from localhost ([127.0.0.1]:48946 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oVHi9-00045v-Rr
	for submit <at> debbugs.gnu.org; Mon, 05 Sep 2022 15:24:46 -0400
Received: from eggs.gnu.org ([209.51.188.92]:44866)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1oVHi7-00045f-1p
 for 57507 <at> debbugs.gnu.org; Mon, 05 Sep 2022 15:24:44 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:57624)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1oVHi1-0003uz-C8; Mon, 05 Sep 2022 15:24:37 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=RbpwUQgKWzA/hVtVBJHICbWsT/2uZuYTzJLs2NkCGP4=; b=E9u0m6a+043ylH+qVxh0
 ZBpwOSKTyKorwD5KBZ5VsaZwPp9EFTqe4eM7egcniTFD9qKfpmBNeOLXOldyCGnI0LhY6l9Wovj2G
 +YJefSHTGupW0RnYkp5+TMfTaTa4IhGIVGanApC3KzeOnWCgIjxKz/a4eKIihkWlxA6TjTojnXRQ+
 1OEVtTS3qI/0t2GeNdNXiYasSIdCLV1+/kCOueB8EzgspK5V75DnT74UIOueuNhJC7ZT9ugAQgphd
 N/Sj7CBHBaLydP6JbPiDw7yUySP2RAZVL+jQl7GGX5Rv0QRX2OZfgKm1pbVyMCfl1jwhRNxCkFLwV
 uGwWvz1Frs3Q1w==;
Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:50487
 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1oVHi0-0005iq-Vw; Mon, 05 Sep 2022 15:24:37 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Jean Abou Samra <jean@HIDDEN>
Subject: Re: bug#57507: Regular expression matching depends on locale encoding
References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN>
 <87mtbe5kiz.fsf@HIDDEN>
 <a9cd5f37-caef-0554-ad5a-a5f1f7fd7919@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: Nonidi 19 Fructidor an 230 de la =?utf-8?Q?R=C3=A9vo?=
 =?utf-8?Q?lution=2C?= jour du Tagette
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Mon, 05 Sep 2022 21:24:35 +0200
In-Reply-To: <a9cd5f37-caef-0554-ad5a-a5f1f7fd7919@HIDDEN> (Jean Abou
 Samra's message of "Mon, 5 Sep 2022 20:39:26 +0200")
Message-ID: <87czc939qk.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 57507
Cc: 57507 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Jean Abou Samra <jean@HIDDEN> skribis:

> Le 05/09/2022 =C3=A0 09:48, Ludovic Court=C3=A8s a =C3=A9crit=C2=A0:
>> Hi Jean,
>>
>> Jean Abou Samra <jean@HIDDEN> skribis:
>>
>>> Regular expressions do funky things with Unicode if a non-Unicode-aware
>>> locale is set. Yet, they're purely string operations, so I don't think
>>> it's expected that they depend on the locale encoding.
>> This is the expected behavior: first because (ice-9 regex) is
>> implemented in terms of the libc regex functions, as Dale put (but that
>> could be thought as an implementation detail), and second because things
>> such as character classes are necessarily locale-dependent (this has
>> bitten us in the past, for instance with <https://bugs.gnu.org/35785>).
>>
>> I hope that makes sense.
>
>
>
> OK, thanks, but in this case, it should be clearly stated as a limitation
> in the (ice-9 regex) documentation IMHO. If you don't know what constrain=
ts
> there are on the implementation, there is no reason to expect this. Would=
 it
> help if I submitted a patch for that?

Yes, that=E2=80=99d be welcome.  I would not call it a constraint or limita=
tion;
for example, that =E2=80=98w=E2=80=99 is not a letter in Swedish is the kin=
d of thing
you=E2=80=99d generally want to take into account.  Now, it=E2=80=99d be ni=
ce if one
could easily specify the locale to operate under, with an API similar to
that of (ice-9 i18n) and its first-class locale objects.

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guile@HIDDEN:
bug#57507; Package guile. Full text available.

Message received at 57507 <at> debbugs.gnu.org:


Received: (at 57507) by debbugs.gnu.org; 5 Sep 2022 18:39:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Sep 05 14:39:35 2022
Received: from localhost ([127.0.0.1]:48818 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oVH0R-0006uq-EE
	for submit <at> debbugs.gnu.org; Mon, 05 Sep 2022 14:39:35 -0400
Received: from mout.kundenserver.de ([212.227.126.187]:59007)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <jean@HIDDEN>) id 1oVH0P-0006ub-4q
 for 57507 <at> debbugs.gnu.org; Mon, 05 Sep 2022 14:39:33 -0400
Received: from [10.245.88.236] ([46.193.67.184]) by mrelayeu.kundenserver.de
 (mreue011 [212.227.15.168]) with ESMTPSA (Nemesis) id
 1MQusJ-1oh35V3L4Z-00NwqR; Mon, 05 Sep 2022 20:39:26 +0200
Message-ID: <a9cd5f37-caef-0554-ad5a-a5f1f7fd7919@HIDDEN>
Date: Mon, 5 Sep 2022 20:39:26 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.0
Subject: Re: bug#57507: Regular expression matching depends on locale encoding
Content-Language: en-US
To: =?UTF-8?Q?Ludovic_Court=c3=a8s?= <ludo@HIDDEN>
References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN>
 <87mtbe5kiz.fsf@HIDDEN>
From: Jean Abou Samra <jean@HIDDEN>
In-Reply-To: <87mtbe5kiz.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K1:D7jOS8t7cIu6Snmzinv5ARhqfi0iqZuprxnEmUmyHMpmcwztqQ2
 0n2I8QWwnKrK1c7uJPXm+B6TF3xgoDg0AZUu6d5tOypTqn/S6Rv8NApBDc4D84MM03hO6l5
 DlTKUB2gSuBOe7GPXRZJ7i8+rsDCS8ewN0y4H5KaieQZqHbxPsjxIE2yRsLssgbQ213yDQp
 AsAFh9FkO4nTEU0T7HV1w==
X-Spam-Flag: NO
X-UI-Out-Filterresults: notjunk:1;V03:K0:7V3t1vMziqU=:jxhteOXgjK+KvVaZnSFcQg
 dnfSVXiQmuoORBiwpftYgYWwJo9vzs4dKXrHUnwRO1UvDRBbd8G9bb87nQKYJo3KnVghzlbvX
 mSUnvYACxuwUTKMrcpbTrat0jjUwUhj6C3A70umqa+zuYTc9lLO/lMQs6vr3rpz41iliz5C3u
 iSD42RovdNuFOJBoiVd2JHtCkvsnO4bQECUJMrDgWwkVYxAvOqAR7LmQ35Hpis7IRpWdL7oZU
 AxcnHhlUBxYmOpx6fPnkdXvSJuVn+0GDVdfIuOUogd0udHHSLgdTaYRtB4m/45fwQPrjKubjG
 aHhLGlWTFPcD8RRtkM2oEeoZVg6o3f0T7lXjfpKBakYf3F+dHeMvgZYioI82tdqmuUw919ziS
 Ns3tWDEizpcP4vKoLugdmT+ihh0o6dEbVNI+jcar/dUHIHi27H4i3ZM1mSUo/wLSoiYuKcQYY
 jWDztufFefaj3ozdWdmHbykxkVCx8RuhfNOj4ROKGJcCfQLZTGJtvfKLZxb7wPyl6rTiHtCve
 bpkes2Yqf4LIJxa3/SWOXlApEmzonN4q/GTDQGshRMXnRjsUIIaMXcp7VQrau7+k291fu2dIn
 snkdhZvfSI3bhz3sUykp9fKcXpGW9qPt4PxSGgBBOSgxLINRa68BOchpeJe/Dg9lQeZyVyni6
 SMT/GIe4JaP9zOfx6d+FNx5gXT1SxMjHG+7QjG5MIMYojcbUGStoPKFK6j/vo05YCsQD4Svpu
 iViGms3cTvDQLwD3PLrGr/TusxWy5iGi3qKr0yAExAl10haTmSNoXvrbqvdBckgChR9hXnK96
 s/ZpYnq
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 57507
Cc: 57507 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Le 05/09/2022 à 09:48, Ludovic Courtès a écrit :
> Hi Jean,
>
> Jean Abou Samra <jean@HIDDEN> skribis:
>
>> Regular expressions do funky things with Unicode if a non-Unicode-aware
>> locale is set. Yet, they're purely string operations, so I don't think
>> it's expected that they depend on the locale encoding.
> This is the expected behavior: first because (ice-9 regex) is
> implemented in terms of the libc regex functions, as Dale put (but that
> could be thought as an implementation detail), and second because things
> such as character classes are necessarily locale-dependent (this has
> bitten us in the past, for instance with <https://bugs.gnu.org/35785>).
>
> I hope that makes sense.



OK, thanks, but in this case, it should be clearly stated as a limitation
in the (ice-9 regex) documentation IMHO. If you don't know what constraints
there are on the implementation, there is no reason to expect this. Would it
help if I submitted a patch for that?





Information forwarded to bug-guile@HIDDEN:
bug#57507; Package guile. Full text available.

Message received at 57507 <at> debbugs.gnu.org:


Received: (at 57507) by debbugs.gnu.org; 5 Sep 2022 07:48:47 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Sep 05 03:48:47 2022
Received: from localhost ([127.0.0.1]:46343 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oV6qd-000291-Iw
	for submit <at> debbugs.gnu.org; Mon, 05 Sep 2022 03:48:47 -0400
Received: from eggs.gnu.org ([209.51.188.92]:40328)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1oV6qb-00028m-52
 for 57507 <at> debbugs.gnu.org; Mon, 05 Sep 2022 03:48:46 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:41818)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1oV6qV-00074l-H6; Mon, 05 Sep 2022 03:48:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=yjLZpPpqJjrzvrwGwTJso5nzx7vIJSnh4hcje0lCoyg=; b=el5seU5/+mmjrJa+VYWn
 BaR1Cib8BUPnNJuzRHN+UdY2gv0L3OHo9kB2948dU9Nbz3EGsxjrQVi7TH1TVkFHRnceCJPZApB81
 bjcs6FXJgOJ2knj101zxr73hjnipfKVsrrYj0/CrUvlbZpeW0ITEzYH7l8OWzh+mMU9drO7Ghd4R/
 xYclP2/0tY81R2t5EwTTH3WKGBT5rYMSXDQqb1UgXyTDusHR9r8B5nl8XQYv0p/J2046YZfTGXb4i
 4U6KpRsp89wVK/HY/rFgexnqqzpT6Y9VyWchhp0xUYo7dlvMbqegACeHdZ9nwZIKk8fCl2Ug6+XPa
 uTLK8eUPK7fgoA==;
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=39144 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1oV6qV-0008FW-1s; Mon, 05 Sep 2022 03:48:39 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Jean Abou Samra <jean@HIDDEN>
Subject: Re: bug#57507: Regular expression matching depends on locale encoding
References: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN>
Date: Mon, 05 Sep 2022 09:48:36 +0200
In-Reply-To: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN> (Jean Abou
 Samra's message of "Wed, 31 Aug 2022 18:54:50 +0200")
Message-ID: <87mtbe5kiz.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 57507
Cc: 57507 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi Jean,

Jean Abou Samra <jean@HIDDEN> skribis:

> Regular expressions do funky things with Unicode if a non-Unicode-aware
> locale is set. Yet, they're purely string operations, so I don't think
> it's expected that they depend on the locale encoding.

This is the expected behavior: first because (ice-9 regex) is
implemented in terms of the libc regex functions, as Dale put (but that
could be thought as an implementation detail), and second because things
such as character classes are necessarily locale-dependent (this has
bitten us in the past, for instance with <https://bugs.gnu.org/35785>).

I hope that makes sense.

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guile@HIDDEN:
bug#57507; Package guile. Full text available.

Message received at 57507 <at> debbugs.gnu.org:


Received: (at 57507) by debbugs.gnu.org; 1 Sep 2022 19:34:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Sep 01 15:34:28 2022
Received: from localhost ([127.0.0.1]:44227 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oTpxM-000740-GU
	for submit <at> debbugs.gnu.org; Thu, 01 Sep 2022 15:34:28 -0400
Received: from p-impout006aa.msg.pkvw.co.charter.net ([47.43.26.137]:52035
 helo=p-impout006.msg.pkvw.co.charter.net)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dsmich@HIDDEN>) id 1oTpxL-00073m-1K
 for 57507 <at> debbugs.gnu.org; Thu, 01 Sep 2022 15:34:27 -0400
Received: from localhost ([34.233.51.36]) by cmsmtp with ESMTP
 id TpxBoNsNkc2JfTpxBolfNc; Thu, 01 Sep 2022 19:34:18 +0000
X-Authority-Analysis: v=2.4 cv=SORR6cjH c=1 sm=1 tr=0 ts=6311093a
 a=TrnfHZhGi+cGSPqA0dbxTQ==:117 a=TrnfHZhGi+cGSPqA0dbxTQ==:17
 a=KTtA7ReM4oAA:10 a=mDV3o1hIAAAA:8 a=LP8mQn3rpFOBcqUV9yAA:9 a=QEXdDO2ut3YA:10
 a=p4KCnXIM3wF440YW14QA:9 a=AvTSyNsJAD-45a5L:21 a=_W_S_7VecoQA:10
 a=_FVE-zBwftR9WsbkzFJk:22
Message-Id: <58cf2a302a753608ba9b978ebace5f13ef0fae70@webmail>
From: dsmich@HIDDEN
To: "'Jean Abou Samra'" <jean@HIDDEN>
X-Mailer: Atmail 
X-Originating-IP: [63.87.53.154]
X-Priority: 3
Importance: Normal
X-MSMail-Priority: Normal
Subject: RE: bug#57507: Regular expression matching depends on locale encoding
Date: Thu, 01 Sep 2022 19:34:17 +0000
Content-Type: multipart/alternative;
 boundary="=_cf6e2bbb45a9e4a4389f62f4f3ba8a83"
MIME-Version: 1.0
X-CMAE-Envelope: MS4xfN4CZhZLLZRVnlzOA4W0ELRl/2zMtitN1T+1OK1AEVjiP3xWVyPGKANl76N1+8aAeceyjy6Z03fro81h8OCOYsqxvtiDBGQcVfGARvBQX9kCBoUjZ1ff
 kPKkmRfVVvvz6S6No+W8PrTU3dXYipt0mTvhlYNBssImpgtrDyNlnF03T952aurKopR1MBh7EJpT+In0T6KVKtRNVKXQA8oQQEc=
X-Spam-Score: 1.8 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview: Also remember that Guile uses the system C library regex
 routines.
 And is using C strings, not Guile strings. (sorry for top post, too tired
 to fight with this web editor) -Dale 
 Content analysis details:   (1.8 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record
 -0.0 SPF_PASS               SPF: sender matches SPF record
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.8 MISSING_MIMEOLE        Message has X-MSMail-Priority, but no X-MimeOLE
X-Debbugs-Envelope-To: 57507
Cc: "'57507 <at> debbugs.gnu.org'" <57507 <at> debbugs.gnu.org>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.8 (/)

--=_cf6e2bbb45a9e4a4389f62f4f3ba8a83
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=0AAlso remember that Guile uses the system C library regex routines. An=
d=0Ais using C strings, not Guile strings.=0A=0A(sorry for top post, too=
 tired to fight with this web editor)=0A=0A-Dale=0A=0A=09---------------=
--------------------------From: "Jean Abou Samra" =0ATo: 57507@HIDDEN=
nu.org=0ACc: =0ASent: Wednesday August 31 2022 12:55:13PM=0ASubject: bug=
#57507: Regular expression matching depends on locale=0Aencoding=0A=0A R=
egular expressions do funky things with Unicode if a=0Anon-Unicode-aware=
=0A locale is set. Yet, they're purely string operations, so I don't=0At=
hink=0A it's expected that they depend on the locale encoding.=0A=0A $ L=
C_ALL=3DC guile3.0=0A GNU Guile 3.0.7=0A Copyright (C) 1995-2021 Free So=
ftware Foundation, Inc.=0A=0A Guile comes with ABSOLUTELY NO WARRANTY; f=
or details type `,show w'.=0A This program is free software, and you are=
 welcome to redistribute it=0A under certain conditions; type `,show c'=
 for details.=0A=0A Enter `,help' for help.=0A scheme@(guile-user)> (use=
-modules (ice-9 regex))=0A scheme@(guile-user)> (match:substring (string=
-match "u203f" "u3091"))=0A ice-9/boot-9.scm:1685:16: In procedure raise=
-exception:=0A In procedure make-regexp: Invalid preceding regular expre=
ssion=0A=0A Entering a new prompt. Type `,bt' for a backtrace or `,q' to=
=0Acontinue.=0A scheme@(guile-user) [1]> ,q=0A scheme@(guile-user)> (mat=
ch:substring (string-match "[u203f]"=0A"u3091"))=0A $1 =3D "u3091"=0A sc=
heme@(guile-user)>=0A=0A

--=_cf6e2bbb45a9e4a4389f62f4f3ba8a83
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><body><br>Also remember that Guile uses the system C library regex=
 routines.&nbsp; And is using C strings, not Guile strings.<br><br>(sorr=
y for top post, too tired to fight with this web editor)<br><br>-Dale<br=
><div class=3D"reply-new-signature"></div><p>---------------------------=
--------------</p>From: "Jean Abou Samra" <jean@HIDDEN><br>To: 57=
507 <at> debbugs.gnu.org<br>Cc: <br>Sent: Wednesday August 31 2022 12:55:13PM=
<br>Subject: bug#57507: Regular expression matching depends on locale en=
coding<br><br>=0ARegular expressions do funky things with Unicode if a=
=0Anon-Unicode-aware<br>=0Alocale is set. Yet, they're purely string ope=
rations, so I don't=0Athink<br>=0Ait's expected that they depend on the=
 locale encoding.<br><br><br><br>=0A$ LC_ALL=3DC guile3.0<br>=0AGNU Guil=
e 3.0.7<br>=0ACopyright (C) 1995-2021 Free Software Foundation, Inc.<br>=
<br>=0AGuile comes with ABSOLUTELY NO WARRANTY; for details type `,show=
=0Aw'.<br>=0AThis program is free software, and you are welcome to redis=
tribute=0Ait<br>=0Aunder certain conditions; type `,show c' for details.=
<br><br>=0AEnter `,help' for help.<br>=0Ascheme@(guile-user)&gt; (use-mo=
dules (ice-9 regex))<br>=0Ascheme@(guile-user)&gt; (match:substring (str=
ing-match "\u203f"=0A"\u3091"))<br>=0Aice-9/boot-9.scm:1685:16: In proce=
dure raise-exception:<br>=0AIn procedure make-regexp: Invalid preceding=
 regular expression<br><br>=0AEntering a new prompt.&nbsp; Type `,bt' fo=
r a backtrace or `,q' to=0Acontinue.<br>=0Ascheme@(guile-user) [1]&gt; ,=
q<br>=0Ascheme@(guile-user)&gt; (match:substring (string-match "[\u203f]=
"=0A"\u3091"))<br>=0A$1 =3D "\u3091"<br>=0Ascheme@(guile-user)&gt;<br><b=
r><br><br><br></jean@HIDDEN></body></html>

--=_cf6e2bbb45a9e4a4389f62f4f3ba8a83--





Information forwarded to bug-guile@HIDDEN:
bug#57507; Package guile. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 31 Aug 2022 16:55:01 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Aug 31 12:55:01 2022
Received: from localhost ([127.0.0.1]:40359 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oTQzU-000275-Uc
	for submit <at> debbugs.gnu.org; Wed, 31 Aug 2022 12:55:01 -0400
Received: from lists.gnu.org ([209.51.188.17]:53616)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <jean@HIDDEN>) id 1oTQzR-00026v-IB
 for submit <at> debbugs.gnu.org; Wed, 31 Aug 2022 12:54:59 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:48694)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <jean@HIDDEN>)
 id 1oTQzP-0007ai-IG
 for bug-guile@HIDDEN; Wed, 31 Aug 2022 12:54:57 -0400
Received: from mout.kundenserver.de ([217.72.192.74]:56449)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <jean@HIDDEN>)
 id 1oTQzN-0006uL-VS
 for bug-guile@HIDDEN; Wed, 31 Aug 2022 12:54:55 -0400
Received: from [192.168.1.128] ([82.65.251.18]) by mrelayeu.kundenserver.de
 (mreue108 [212.227.15.184]) with ESMTPSA (Nemesis) id
 1MjSDU-1p8aMz3GIc-00ksss for <bug-guile@HIDDEN>; Wed, 31 Aug 2022 18:54:51
 +0200
Message-ID: <dba30c62-7ac8-21ea-0a79-6a22bb108284@HIDDEN>
Date: Wed, 31 Aug 2022 18:54:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.0
Content-Language: en-US
To: bug-guile@HIDDEN
From: Jean Abou Samra <jean@HIDDEN>
Subject: Regular expression matching depends on locale encoding
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K1:QLmlXCYqcqLP3agz04lc3RXCFoPSB7Tf2Tkiw3gTJk1qzF/XRe/
 v0a8Jcm2mMTbQQbBWS5Hhlbgv8jPvoHi78kNSYnsikO3tH5dfPX5RA2WYJb6wCfsS/tR77K
 XHL3NJwn86NZbrOTScuxTSnD/7aqNnjDXJmZKCNXMD5xoK8b6I8edmKhMUEJgYijQQq7KVA
 wXTDK1oRzWxAVNe+y0qTQ==
X-Spam-Flag: NO
X-UI-Out-Filterresults: notjunk:1;V03:K0:FYZex9tiPUk=:OIid1czp9DkRTgEzF3lsdX
 aOvtsPvp4yoDWxvedLKjkOrlvr95dbi6p0xDYzbJY2GXpmWqYzHJvlLmrGHGcVCMGvIetM17N
 Qu6Ma9FWT4q/E7mB+upMwfIPI+z5wpArdNjFCVXsC2I1vto/Vy45s/f4S+uQJ9yOA8jhmbXVJ
 U8mVF1bnLYiCza3kOl5EmiMoSHQep7ZaiyvezfN8UumCjeNBXtMo5fRB7umT6t9GlhnL7POQp
 2Vjfe53DDZ4iDu4cUV1Gx8K0smZtVCqIJUrbJOoHiGB7k8GpKoSs236xhGS5gSFlE5M+4a0aH
 nHh5cQ8DI7yfvoIwDWfjU27ZWFUr1mfGB4rbEe2qojLfWtPYuarVtF0urSg6wAb86EZx6OqrN
 ug0X14Ux344FRJffFl8RM4AJ0vbonw2VjZUy45VS97lxb7r5HG8/JiUqNdIHaYlH/g/qSOVO9
 U8iwxTebZKrVXkgUN15JsTHUcBv+r80RxQVfO1fzb2QzkhX/Cf7H4J71QS+UJpaBKaVXfvxij
 fPAnnFZMwW/fRAW7PdxlCJYZ8FgOxvpjGy1YWpNwB2yN8KPY3ec1anGniI1sHIye3CwH1n+aa
 RXdqSLRXhvJHoTfh33bwszsSkqWfEYm3ZqyEy6GkkK0kLKq/l1ilFYq/oyMMVs7gdlkcqvSvf
 6HWNibBFxqUo16l6GpSb03Vh9pprO5R9M3hvOWCU2LEj521mINqSUuJdmJLD2gFnF7YxX1l7p
 54NWghrQo20N2TR49kd8pM3xT7kUq73X71zsTA==
Received-SPF: none client-ip=217.72.192.74; envelope-from=jean@HIDDEN;
 helo=mout.kundenserver.de
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001,
 RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Regular expressions do funky things with Unicode if a non-Unicode-aware
locale is set. Yet, they're purely string operations, so I don't think
it's expected that they depend on the locale encoding.



$ LC_ALL=C guile3.0
GNU Guile 3.0.7
Copyright (C) 1995-2021 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (use-modules (ice-9 regex))
scheme@(guile-user)> (match:substring (string-match "\u203f" "\u3091"))
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
In procedure make-regexp: Invalid preceding regular expression

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
scheme@(guile-user) [1]> ,q
scheme@(guile-user)> (match:substring (string-match "[\u203f]" "\u3091"))
$1 = "\u3091"
scheme@(guile-user)>





Acknowledgement sent to Jean Abou Samra <jean@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-guile@HIDDEN. Full text available.
Report forwarded to bug-guile@HIDDEN:
bug#57507; Package guile. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 5 Sep 2022 19:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.