GNU bug report logs - #22913
filenames mangled by locale

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guile; Reported by: Zefram <zefram@HIDDEN>; dated Sat, 5 Mar 2016 00:44:02 UTC; Maintainer for guile is bug-guile@HIDDEN.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 5 Mar 2016 00:43:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 04 19:43:04 2016
Received: from localhost ([127.0.0.1]:34203 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ac0JE-0007JI-Gs
	for submit <at> debbugs.gnu.org; Fri, 04 Mar 2016 19:43:04 -0500
Received: from eggs.gnu.org ([208.118.235.92]:41983)
 by debbugs.gnu.org with esmtp (Exim 4.84)
 (envelope-from <zefram@HIDDEN>) id 1ac0JD-0007Io-50
 for submit <at> debbugs.gnu.org; Fri, 04 Mar 2016 19:43:03 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <zefram@HIDDEN>) id 1ac0J7-0003nR-3h
 for submit <at> debbugs.gnu.org; Fri, 04 Mar 2016 19:42:58 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:57054)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <zefram@HIDDEN>) id 1ac0J7-0003nM-0N
 for submit <at> debbugs.gnu.org; Fri, 04 Mar 2016 19:42:57 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:32954)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <zefram@HIDDEN>) id 1ac0J6-0000WB-0w
 for bug-guile@HIDDEN; Fri, 04 Mar 2016 19:42:56 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <zefram@HIDDEN>) id 1ac0J5-0003n5-1D
 for bug-guile@HIDDEN; Fri, 04 Mar 2016 19:42:55 -0500
Received: from river6.fysh.org ([2001:41d0:d:20da::2]:34679
 helo=river.fysh.org) by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <zefram@HIDDEN>) id 1ac0J4-0003n0-RK
 for bug-guile@HIDDEN; Fri, 04 Mar 2016 19:42:54 -0500
Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian))
 id 1ac0J1-0006Tx-Fy; Sat, 05 Mar 2016 00:42:51 +0000
Date: Sat, 5 Mar 2016 00:42:51 +0000
From: Zefram <zefram@HIDDEN>
To: bug-guile@HIDDEN
Subject: filenames mangled by locale
Message-ID: <20160305004251.GF7946@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

It seems that guile-2.0 applies locale encoding and decoding to pathnames
being used in system calls.  This radically breaks file access anywhere
that the locale's character encoding is anything other than a simple
8-bit encoding such as ISO-8859-1.  For example, in the default C locale
with its nominal ASCII encoding,

$ guile-2.0 -c '(open-file (list->string (map integer->char '\''(76 195 169 111 110))) "w")'
$ echo L*n | od -tc
0000000   L   ?   ?   o   n  \n
0000006

Those are literal question marks in the name of the file actually
created, apparently arising as substitutions for the high-half octets in
the requested filename.  Existing files with names containing high-half
octets can't be found (resulting in an ENOENT error message that shows the
actually-existing filename), and new ones can't be created (actually being
created under the mangled name instead).  There's no warning or exception
advising that the requested name can't be used, just this misbehaviour.

The equivalent problem arises with decoding when filenames are received:

$ echo foo > $'L\303\251on.txt'
$ guile-2.0 -c '(define d (opendir ".")) (let r () (let ((n (readdir d))) (if (eof-object? n) #t (begin (if (eq? (car (reverse (string->list n))) #\t) (begin (write (map char->integer (string->list n))) (newline))) (r)))))'
(76 63 63 111 110 46 116 120 116)

Again no warning or exception, just incorrect data returned.

To work around this would require the program to select a locale with
a more accommodating nominal character encoding.  As I've previously
noted, there's no guarantee of such a locale existing.  Thus the above
behaviour is fatal to any attempt to write in Guile Scheme a program to
operate on arbitrarily-named files.

Guile even applies this mangling to the pathname of a script that it is
to load:

$ echo '(write "hi")(newline)' > $'L\303\251on.scm'     
$ guile-2.0 -s L*n.scm
[big error message saying it couldn't find the file that exists]

Obviously, even if a program could turn off the locale mangling in
general, this instance of it occurs too early for the program to avoid.
The guile framework itself has acquired the kind of 8-bit-cleanliness
bug that it is imposing on the programs that it interprets.

-zefram




Acknowledgement sent to Zefram <zefram@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-guile@HIDDEN. Full text available.
Report forwarded to bug-guile@HIDDEN:
bug#22913; Package guile. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.