GNU bug report logs - #39970
guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guix; Reported by: "pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN>; dated Sat, 7 Mar 2020 12:02:01 UTC; Maintainer for guix is bug-guix@HIDDEN.

Message received at 39970 <at> debbugs.gnu.org:


Received: (at 39970) by debbugs.gnu.org; 12 Mar 2020 16:05:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Mar 12 12:05:35 2020
Received: from localhost ([127.0.0.1]:57731 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jCQL1-0007Li-Bx
	for submit <at> debbugs.gnu.org; Thu, 12 Mar 2020 12:05:35 -0400
Received: from eggs.gnu.org ([209.51.188.92]:39236)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1jCQL0-0007LN-4z
 for 39970 <at> debbugs.gnu.org; Thu, 12 Mar 2020 12:05:34 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:60468)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@HIDDEN>)
 id 1jCQKu-0001YT-MO; Thu, 12 Mar 2020 12:05:28 -0400
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=49886 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1jCQKu-0006Cm-1q; Thu, 12 Mar 2020 12:05:28 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian@HIDDEN>
Subject: Re: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and
 Turkish 'tr_TR' locales
References: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
 <20200307152003.myj7jkjthokbmark@HIDDEN>
 <20200308070804.ylpb5yrwpgbc3p3w@HIDDEN>
 <8736ah1mxb.fsf@HIDDEN>
 <20200312110206.2hsinzejnmcefmot@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 23 =?utf-8?Q?Vent=C3=B4se?= an 228 de la =?utf-8?Q?R?=
 =?utf-8?Q?=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Thu, 12 Mar 2020 17:05:26 +0100
In-Reply-To: <20200312110206.2hsinzejnmcefmot@HIDDEN>
 (pelzflorian@HIDDEN's message of "Thu, 12 Mar 2020 12:02:06
 +0100")
Message-ID: <874kutsgmx.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 39970
Cc: 39970 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN> skribis:

> On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Court=C3=A8s wrote:
>> To me it=E2=80=99s not a bug in Guile, but simply the fact that regexps,=
 as
>> implemented by the C library, are locale-dependent.
>>=20
>
> (use-modules (ice-9 regex))
> (regexp-exec (make-regexp "^([a-z]+)$")
>              "iyiyim")
> =E2=87=92 #f
>
> Guile=E2=80=99s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine.  It is different from
> the behavior of current glibc:
>
> florian@florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiy=C4=B1m"
> int main (int    argc,
>           char** argv)
> {

You=E2=80=99re seeing a different behavior because you forgot a:

  setlocale (LC_ALL, "");

call here.

>> The patch you proposed looks good to me, though perhaps we could
>> explicitly list all the alphabet in the regexp?
>>=20
>> A better option is to reimplement =E2=80=98store-path-package-name=E2=80=
=99 in a way
>> similar to =E2=80=98store-path-hash-part=E2=80=99, as in commit
>> 35eb77b09d957019b2437e7681bd88013d67d3cd.
>
> I suppose it would be better to cache the compiled regexp.  What is
> this mcached syntax inside (guix store)?  Or do I use Scheme=E2=80=99s 'd=
elay'
> and 'force' for caching?

I lean towards avoiding regexps altogether, as I wrote above.

WDYT?

> The attached patch fixes the regexp.  Shall I push the attached patch
> and then try making it cache the compiled regexp or do you still
> prefer an implementation without regexps?  Why would not using a
> regexp be better?

It reduces reliance on libc, reduces complexity, and performs better as
noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#39970; Package guix. Full text available.

Message received at 39970 <at> debbugs.gnu.org:


Received: (at 39970) by debbugs.gnu.org; 12 Mar 2020 11:02:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Mar 12 07:02:12 2020
Received: from localhost ([127.0.0.1]:56026 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jCLbQ-0004ld-7R
	for submit <at> debbugs.gnu.org; Thu, 12 Mar 2020 07:02:12 -0400
Received: from pelzflorian.de ([5.45.111.108]:38946 helo=mail.pelzflorian.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pelzflorian@HIDDEN>) id 1jCLbN-0004lT-AH
 for 39970 <at> debbugs.gnu.org; Thu, 12 Mar 2020 07:02:10 -0400
Received: from pelzflorian.localdomain (unknown [5.45.111.108])
 by mail.pelzflorian.de (Postfix) with ESMTPSA id 9E70436055C;
 Thu, 12 Mar 2020 12:02:07 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=pelzflorian.de;
 s=mail; t=1584010927;
 bh=AANki3xAx4PXrO6y9Mh4j/xrDg8K+0BCPVEVHgXefPM=;
 h=Date:From:To:Cc:Subject:References:In-Reply-To;
 b=bcjRaZB9HOT37d3pXA3fe7cwv7ITMAhWwTIyfkpOifroB94rWUS91kjYKVozCpZSE
 GjZcHr4ffH4n/pE7pmlzw70yQwgNQ0/oNOuT5yl0ybdNMbEA4lVDAVnMlvm2d+wrYD
 Rz9Ntw8MHDHXfVygdz5EIrJdk7rR+1wGmvfjfjvQ=
Date: Thu, 12 Mar 2020 12:02:06 +0100
From: "pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and
 Turkish 'tr_TR' locales
Message-ID: <20200312110206.2hsinzejnmcefmot@HIDDEN>
References: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
 <20200307152003.myj7jkjthokbmark@HIDDEN>
 <20200308070804.ylpb5yrwpgbc3p3w@HIDDEN>
 <8736ah1mxb.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="n32ce3wcv3t3334v"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <8736ah1mxb.fsf@HIDDEN>
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 39970
Cc: 39970 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--n32ce3wcv3t3334v
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote:
> To me it’s not a bug in Guile, but simply the fact that regexps, as
> implemented by the C library, are locale-dependent.
> 

(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^([a-z]+)$")
             "iyiyim")
⇒ #f

Guile’s behavior that i is not among [a-z] has been confirmed as
unexpected by a natively Turkish friend of mine.  It is different from
the behavior of current glibc:

florian@florianmacbook ~$ cat iyiyim.c
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#define STR "iyiyım"
int main (int    argc,
          char** argv)
{
  regex_t only_letters;
  int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
  if (r != 0)
    printf ("This error does not happen.\n");
  r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
  if (r == 0)
    printf ("The string " STR " matched!\n");
  else
    printf ("No match for " STR ".\n");
}
florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim 
The string iyiyım matched!

Apparently Guile uses a bundled regular expression library rather than
glibc.  I can try making Guile use a newer GNUlib for its regular
expressions, maybe that helps.  Shall I file a separate bug for Guile?

> The patch you proposed looks good to me, though perhaps we could
> explicitly list all the alphabet in the regexp?
> 
> A better option is to reimplement ‘store-path-package-name’ in a way
> similar to ‘store-path-hash-part’, as in commit
> 35eb77b09d957019b2437e7681bd88013d67d3cd.

I suppose it would be better to cache the compiled regexp.  What is
this mcached syntax inside (guix store)?  Or do I use Scheme’s 'delay'
and 'force' for caching?

The attached patch fixes the regexp.  Shall I push the attached patch
and then try making it cache the compiled regexp or do you still
prefer an implementation without regexps?  Why would not using a
regexp be better?

Regards,
Florian

--n32ce3wcv3t3334v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment;
	filename="0001-store-Fix-many-guix-commands-failing-on-some-locales.patch"

From: Florian Pelz <pelzflorian@HIDDEN>
Date: Thu, 12 Mar 2020 11:08:16 +0100
Subject: [PATCH] store: Fix many guix commands failing on some locales.

Fixes bug #39970 (see: https://bugs.gnu.org/39970).

At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' are affected.

* guix/store.scm (store-regexp*): Avoid dependence on locale.
---
 guix/store.scm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..82d7403bb6 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1949,7 +1949,8 @@ valid inputs."
   (mlambda (store)
     "Return a regexp matching a file in STORE."
     (make-regexp (string-append "^" (regexp-quote store)
-                                "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+                                "\
+/([0-9abcdfghijklmnpqrsvwxyz]{32})-([^/]+)$"))))
 
 (define (store-path-package-name path)
   "Return the package name part of PATH, a file name in the store."
-- 
2.25.1


--n32ce3wcv3t3334v--




Information forwarded to bug-guix@HIDDEN:
bug#39970; Package guix. Full text available.

Message received at 39970 <at> debbugs.gnu.org:


Received: (at 39970) by debbugs.gnu.org; 9 Mar 2020 17:02:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 09 13:02:51 2020
Received: from localhost ([127.0.0.1]:51406 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jBLnn-0002U5-0g
	for submit <at> debbugs.gnu.org; Mon, 09 Mar 2020 13:02:51 -0400
Received: from eggs.gnu.org ([209.51.188.92]:40113)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1jBLnl-0002Tq-DK
 for 39970 <at> debbugs.gnu.org; Mon, 09 Mar 2020 13:02:49 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:46764)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@HIDDEN>)
 id 1jBLng-00075B-2X; Mon, 09 Mar 2020 13:02:44 -0400
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=56274 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1jBLnf-0000Ws-Cx; Mon, 09 Mar 2020 13:02:43 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian@HIDDEN>
Subject: Re: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and
 Turkish 'tr_TR' locales
References: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
 <20200307152003.myj7jkjthokbmark@HIDDEN>
 <20200308070804.ylpb5yrwpgbc3p3w@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 20 =?utf-8?Q?Vent=C3=B4se?= an 228 de la =?utf-8?Q?R?=
 =?utf-8?Q?=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Mon, 09 Mar 2020 18:02:40 +0100
In-Reply-To: <20200308070804.ylpb5yrwpgbc3p3w@HIDDEN>
 (pelzflorian@HIDDEN's message of "Sun, 8 Mar 2020 08:08:04
 +0100")
Message-ID: <8736ah1mxb.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 39970
Cc: 39970 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN> skribis:

> This seems similar to <https://bugs.gnu.org/35785>.

Yes, same story.

> I think enumerating all characters explicitly is a similar fix,
> whether or not there is a bug in Guile.

To me it=E2=80=99s not a bug in Guile, but simply the fact that regexps, as
implemented by the C library, are locale-dependent.

The patch you proposed looks good to me, though perhaps we could
explicitly list all the alphabet in the regexp?

A better option is to reimplement =E2=80=98store-path-package-name=E2=80=99=
 in a way
similar to =E2=80=98store-path-hash-part=E2=80=99, as in commit
35eb77b09d957019b2437e7681bd88013d67d3cd.

Thoughts?

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#39970; Package guix. Full text available.

Message received at 39970 <at> debbugs.gnu.org:


Received: (at 39970) by debbugs.gnu.org; 8 Mar 2020 07:08:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Mar 08 03:08:13 2020
Received: from localhost ([127.0.0.1]:47843 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jAq2l-0004Ys-M2
	for submit <at> debbugs.gnu.org; Sun, 08 Mar 2020 03:08:12 -0400
Received: from pelzflorian.de ([5.45.111.108]:60416 helo=mail.pelzflorian.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pelzflorian@HIDDEN>) id 1jAq2h-0004Yh-K4
 for 39970 <at> debbugs.gnu.org; Sun, 08 Mar 2020 03:08:08 -0400
Received: from pelzflorian.localdomain (unknown [5.45.111.108])
 by mail.pelzflorian.de (Postfix) with ESMTPSA id D7B393604F7
 for <39970 <at> debbugs.gnu.org>; Sun,  8 Mar 2020 08:08:05 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=pelzflorian.de;
 s=mail; t=1583651285;
 bh=H8wLe+1JHgwFsr5xYovoRoQ+aLGhou+WW6jlY2STzyk=;
 h=Date:From:To:Subject:References:In-Reply-To;
 b=tdGtzU34p0QiUvjmrjI80kbccdKcnEti0vCVBf0DpXc91zUUpaOD67+7bT7byTpqk
 yuyRClWE9X7e6J+WbpqbmbxxxFHEQWmZ+gBqsWPQwi3DpXLUD6Z3JOtBYRskyHnzEc
 wMuUMiijyC4AwZ4rphNv872mNhMmk0qxutS9GbjM=
Date: Sun, 8 Mar 2020 08:08:04 +0100
From: "pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN>
To: 39970 <at> debbugs.gnu.org
Subject: Re: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and
 Turkish 'tr_TR' locales
Message-ID: <20200308070804.ylpb5yrwpgbc3p3w@HIDDEN>
References: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
 <20200307152003.myj7jkjthokbmark@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200307152003.myj7jkjthokbmark@HIDDEN>
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 39970
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

This seems similar to <https://bugs.gnu.org/35785>.  I think
enumerating all characters explicitly is a similar fix, whether or not
there is a bug in Guile.

Regards,
Florian




Information forwarded to bug-guix@HIDDEN:
bug#39970; Package guix. Full text available.

Message received at 39970 <at> debbugs.gnu.org:


Received: (at 39970) by debbugs.gnu.org; 7 Mar 2020 15:20:08 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Mar 07 10:20:08 2020
Received: from localhost ([127.0.0.1]:47389 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jAbFH-0007eJ-NG
	for submit <at> debbugs.gnu.org; Sat, 07 Mar 2020 10:20:07 -0500
Received: from pelzflorian.de ([5.45.111.108]:59254 helo=mail.pelzflorian.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pelzflorian@HIDDEN>) id 1jAbFG-0007eA-9Q
 for 39970 <at> debbugs.gnu.org; Sat, 07 Mar 2020 10:20:07 -0500
Received: from pelzflorian.localdomain (unknown [5.45.111.108])
 by mail.pelzflorian.de (Postfix) with ESMTPSA id D844A3604F7
 for <39970 <at> debbugs.gnu.org>; Sat,  7 Mar 2020 16:20:04 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=pelzflorian.de;
 s=mail; t=1583594404;
 bh=gsrcqTx00cvuN+3NQyTQ4jOfeqHmeAVV0iEcKHbpTXY=;
 h=Date:From:To:Subject:References:In-Reply-To;
 b=aR3ZZKVVSFQp0XKfQ3OR6jzNJMifbmY+LVVLaFgZcELMB9ZkiYN/InKMe5UAOeEZs
 2HcwhP0t74ySv+cYTOtuFLfriWx2J+FaETOU1/Fgob4jPFRljv7pNqY1g5MRxeuOmr
 f7WR5f/DhyY+L3Uqy0OnWWyqwIH7vSUw6Ayewa+0=
Date: Sat, 7 Mar 2020 16:20:03 +0100
From: "pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN>
To: 39970 <at> debbugs.gnu.org
Subject: Re: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and
 Turkish 'tr_TR' locales
Message-ID: <20200307152003.myj7jkjthokbmark@HIDDEN>
References: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 39970
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

On Sat, Mar 07, 2020 at 01:00:52PM +0100, pelzflorian (Florian Pelz) wrote:
> Running guix via ./pre-inst-env gives a more useful backtrace.  The
> reason is that in guix/store.scm
> 
> (use-modules (ice-9 regex))
> (regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$")
>              "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv")
> 
> evaluates to #f in Turkish, possibly because of the presence of
> dotless i (ı) in the range.
> 

Actually it seems the issue is that i is missing from the range [a-z]
ı and ğ are missing as well, as are non-Turkish letters like ä that
are included when using the en_US.utf8 locale, even though they are no
English letters either.

(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^([a-z]+)$")
             "iyiyim")

fails.

But running a glibc C program

florian@florianmacbook ~$ cat iyiyim.c
#include <regex.h>
#include <stdio.h>
#define STR "iyiyim"
int main (int    argc,
          char** argv)
{
  regex_t only_letters;
  int r = regcomp (&only_letters, "[a-z]", 0);
  if (r != 0)
    printf ("This error does not happen.\n");
  r = regexec (&only_letters, STR, 0, NULL, 0);
  if (r == 0)
    printf ("The string " STR " matched!\n");
  else
    printf ("No match for " STR ".\n");
}
florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c 
florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim 
The string iyiyim matched!

succeeds on tr_TR.utf8 and en_US.utf8 locales (and a native Turkish
speaker confirmed to me ıi should be in the alphabet right after h).
Maybe this is a bug in Guile, somehow?

> […]
> I wonder what else is affected; the installer maybe?  I have not
> tested yet.
>

I checked; the graphical installer appears unaffected, but the issue
appears on the installed system.

Regards,
Florian




Information forwarded to bug-guix@HIDDEN:
bug#39970; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 7 Mar 2020 12:01:03 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Mar 07 07:01:03 2020
Received: from localhost ([127.0.0.1]:46326 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jAY8d-00076x-Au
	for submit <at> debbugs.gnu.org; Sat, 07 Mar 2020 07:01:03 -0500
Received: from lists.gnu.org ([209.51.188.17]:51469)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <pelzflorian@HIDDEN>) id 1jAY8b-00076e-LQ
 for submit <at> debbugs.gnu.org; Sat, 07 Mar 2020 07:01:02 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:54923)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <pelzflorian@HIDDEN>) id 1jAY8Y-0005KP-5q
 for bug-guix@HIDDEN; Sat, 07 Mar 2020 07:01:00 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,URIBL_BLOCKED
 autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <pelzflorian@HIDDEN>) id 1jAY8X-0001G5-17
 for bug-guix@HIDDEN; Sat, 07 Mar 2020 07:00:58 -0500
Received: from pelzflorian.de ([5.45.111.108]:38970 helo=mail.pelzflorian.de)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <pelzflorian@HIDDEN>)
 id 1jAY8W-0001Bf-G9
 for bug-guix@HIDDEN; Sat, 07 Mar 2020 07:00:56 -0500
Received: from pelzflorian.localdomain (unknown [5.45.111.108])
 by mail.pelzflorian.de (Postfix) with ESMTPSA id A388A3604F7
 for <bug-guix@HIDDEN>; Sat,  7 Mar 2020 13:00:53 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=pelzflorian.de;
 s=mail; t=1583582453;
 bh=ZZun/EedyaYOGn1SbwQJGv0StjxN5hjB1YzmJSzwN+Y=;
 h=Date:From:To:Subject;
 b=zqJFLa89uJhi1xo68JZj4DNwZnk89W5Lya7qX9jkpwpDDWr7vf3vKkavuECD15JaU
 nKxJ8Rqpw6Ga7OXzqG9BRXjLn+Osvpdj7H9QxX9sDRyrqaogerINeXzIlVjCNr5kdX
 QffRTL1k7WN9bB4NrG+usmqv9WuNVugZNAWyMY+Y=
Date: Sat, 7 Mar 2020 13:00:52 +0100
From: "pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN>
To: bug-guix@HIDDEN
Subject: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR'
 locales
Message-ID: <20200307120052.ocwzphlvemvmb2ts@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="n7jv6se5e47ytj2s"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
 [fuzzy]
X-Received-From: 5.45.111.108
X-Spam-Score: 0.2 (/)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.8 (/)


--n7jv6se5e47ytj2s
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

After running

export LC_ALL=3Dtr_TR.utf8

many important Guix commands like 'guix environment', 'guix install'
and 'guix pull' fail.

$ guix environment --ad-hoc hello
Backtrace:
           1 (primitive-load "/home/florian/.config/guix/current/bin=E2=80=
=A6")
In guix/ui.scm:
  1826:12  0 (run-guix-command _ . _)

guix/ui.scm:1826:12: In procedure run-guix-command:
In procedure string-length: Wrong type argument in position 1 (expecting =
string): #f


Running guix via ./pre-inst-env gives a more useful backtrace.  The
reason is that in guix/store.scm

(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$")
             "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13=
.1.drv")

evaluates to #f in Turkish, possibly because of the presence of
dotless i (=C4=B1) in the range.

The attached patch fixes the issue by including i explicitly, but I
believe enumerating all of [0-9abcdfghijklmnpqrsvwxyz] explicitly
might be more future-proof.

Shall I push the patch modified to list all letters in
[0-9abcdfghijklmnpqrsvwxyz] explicitly?  Numbers too?  I suppose there
is no downside to listing all without ranges.

I wonder what else is affected; the installer maybe?  I have not
tested yet.

Regards,
Florian

--n7jv6se5e47ytj2s
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment;
	filename="0001-store-Fix-many-guix-commands-failing-on-some-locales.patch"

From 4445284e9fd40b3e271fa7b511d2856c03c8ccfb Mon Sep 17 00:00:00 2001
From: Florian Pelz <pelzflorian@HIDDEN>
Date: Sat, 7 Mar 2020 11:38:59 +0100
Subject: [PATCH] store: Fix many guix commands failing on some locales.

At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' are affected.

* guix/store.scm (store-regexp*): Avoid dependence on locale.
---
 guix/store.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..a1d9713c24 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1949,7 +1949,7 @@ valid inputs."
   (mlambda (store)
     "Return a regexp matching a file in STORE."
     (make-regexp (string-append "^" (regexp-quote store)
-                                "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+                                "/([0-9a-df-hij-np-sv-z]{32})-([^/]+)$"))))
 
 (define (store-path-package-name path)
   "Return the package name part of PATH, a file name in the store."
-- 
2.25.0


--n7jv6se5e47ytj2s--




Acknowledgement sent to "pelzflorian (Florian Pelz)" <pelzflorian@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-guix@HIDDEN. Full text available.
Report forwarded to bug-guix@HIDDEN:
bug#39970; Package guix. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 12 Mar 2020 16:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.