Received: (at 20823) by debbugs.gnu.org; 14 Aug 2016 21:36:21 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Aug 14 17:36:21 2016 Received: from localhost ([127.0.0.1]:57636 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bZ34v-0003N5-JI for submit <at> debbugs.gnu.org; Sun, 14 Aug 2016 17:36:21 -0400 Received: from river.fysh.org ([87.98.248.19]:41886 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <zefram@HIDDEN>) id 1bZ34t-0003Mv-UV for 20823 <at> debbugs.gnu.org; Sun, 14 Aug 2016 17:36:20 -0400 Received: from zefram by river.fysh.org with local (Exim 4.84_2 #1 (Debian)) id 1bZ34q-0006Mk-Jb; Sun, 14 Aug 2016 22:36:16 +0100 Date: Sun, 14 Aug 2016 22:36:16 +0100 From: Zefram <zefram@HIDDEN> To: 20823 <at> debbugs.gnu.org Subject: Re: bug#20823: argv mangled by locale Message-ID: <20160814213616.GA22491@HIDDEN> References: <20150616043300.GB2718@HIDDEN> <87a8ibjeum.fsf@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a8ibjeum.fsf@HIDDEN> X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 20823 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.5 (/) Andy Wingo wrote: > I also don't >know whether to supply an optional "encoding" argument, and use that >encoding to decode the command line arguments. If you don't fancy the profusion of extra "encoding" parameters on argv access (this ticket), environment access (bug#20822), and all sorts of syscalls (bug#22913), you could bundle them all together in a fluid. This would be a bit like the %default-port-encoding fluid, but setlocale should absolutely not modify it. It should follow the scheme that I laid out in bug#24186: its value can be either a string naming an encoding, or #:locale-at-io meaning that whenever encoding is required the currently selected locale is consulted. There should also be a fluid determining the conversion strategy, like the existing %default-port-conversion-strategy. These two fluids together would control the encoding and decoding for all operations that currently apply the locale encoding to arbitrary data. (Decoding locale-supplied messages is a different matter.) -zefram
bug-guile@HIDDEN
:bug#20823
; Package guile
.
Full text available.Received: (at 20823) by debbugs.gnu.org; 24 Jun 2016 08:42:59 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jun 24 04:42:59 2016 Received: from localhost ([127.0.0.1]:53375 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bGMhX-0002Ij-8c for submit <at> debbugs.gnu.org; Fri, 24 Jun 2016 04:42:59 -0400 Received: from river.fysh.org ([87.98.248.19]:57633 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <zefram@HIDDEN>) id 1bGMhW-0002Ia-9b for 20823 <at> debbugs.gnu.org; Fri, 24 Jun 2016 04:42:58 -0400 Received: from zefram by river.fysh.org with local (Exim 4.84_2 #1 (Debian)) id 1bGMhQ-0008FR-Ri; Fri, 24 Jun 2016 09:42:52 +0100 Date: Fri, 24 Jun 2016 09:42:52 +0100 From: Zefram <zefram@HIDDEN> To: Andy Wingo <wingo@HIDDEN> Subject: Re: bug#20823: argv mangled by locale Message-ID: <20160624084252.GE1170@HIDDEN> References: <20150616043300.GB2718@HIDDEN> <87a8ibjeum.fsf@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a8ibjeum.fsf@HIDDEN> X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: 20823 Cc: 20823 <at> debbugs.gnu.org, mhw@HIDDEN, ludo@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.4 (-) Andy Wingo wrote: > I also don't >know whether to supply an optional "encoding" argument, and use that >encoding to decode the command line arguments. That, or something that just retrieves octets, is necessary. Decoding via the selected locale does not suffice, because there's no guarantee that there'll be a locale with a cooperative encoding. -zefram
bug-guile@HIDDEN
:bug#20823
; Package guile
.
Full text available.Received: (at 20823) by debbugs.gnu.org; 24 Jun 2016 06:11:40 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jun 24 02:11:40 2016 Received: from localhost ([127.0.0.1]:53238 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bGKL6-0006iR-IE for submit <at> debbugs.gnu.org; Fri, 24 Jun 2016 02:11:40 -0400 Received: from pb-sasl2.pobox.com ([64.147.108.67]:64440 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1bGKL5-0006iK-0y for 20823 <at> debbugs.gnu.org; Fri, 24 Jun 2016 02:11:39 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id B543B26CA5; Fri, 24 Jun 2016 02:11:37 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=UV9FTdXROGLoOAt41LFbiVTtxWM=; b=CWUTHs vrPSkAn+2UOB0cNDnGvEIqjNUUhAbpTJfuyguTqztyeIAXev1W6CL6gGjJzrU65/ 6UWNN61fDGDieDEOV6biuTAoFV8jZHpkoJLv4LsLezX5MSZOUiA+ksST34xk72lZ QImalIrotsvpREUrkCzlvHrPr5mBvu+BM3/Lc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=vOpER5KUHWi2QvuDUIq+gdtt6lDR4NYJ 2L1MIZWfuNwGowgLVNZARLvqmfLGTjgPZQeToy2xuNO6ZbgDnVG3pPqnueFTPo5o 17ANNjwve+zAb70aCD3ESsGackLwP55h8Qg8R5M89jaKZsFNna6AredTdac0ns7Z HQDR5YTXL64= Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id AE31726CA4; Fri, 24 Jun 2016 02:11:37 -0400 (EDT) Received: from clucks (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 0B81126CA3; Fri, 24 Jun 2016 02:11:36 -0400 (EDT) From: Andy Wingo <wingo@HIDDEN> To: mhw@HIDDEN, ludo@HIDDEN Subject: Re: bug#20823: argv mangled by locale References: <20150616043300.GB2718@HIDDEN> Date: Fri, 24 Jun 2016 08:11:29 +0200 In-Reply-To: <20150616043300.GB2718@HIDDEN> (zefram@HIDDEN's message of "Tue, 16 Jun 2015 05:33:00 +0100") Message-ID: <87a8ibjeum.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 8233A74A-39D2-11E6-A7F8-28A6F1301B6D-02397024!pb-sasl2.pobox.com X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: 20823 Cc: 20823 <at> debbugs.gnu.org, Zefram <zefram@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.4 (-) On Tue 16 Jun 2015 06:33, Zefram <zefram@HIDDEN> writes: > I don't see any Scheme interface that reliably retrieves the command > line arguments without locale decoding. [...] > The actual data passed between processes is an octet string, and > there really needs to be some reliable way to access that octet string. > My comments about resolution in bug#20822 "environment mangled by locale" > mostly apply here too, with a slight change: it seems necessary to store > the original octet strings and decode at the time program-arguments is > called. With that change, the decoding can be responsive to setlocale > (and in particular can reliably use ISO-8859-1 in the absence of > setlocale). Proposal: scm_i_set_boot_program_arguments just copies the bytes, and scm_program_arguments decodes them. I don't know whether to save the locale that was current at program start and use that locale to decode the arguments, or default the current locale, or what. I also don't know whether to supply an optional "encoding" argument, and use that encoding to decode the command line arguments. Thoughts, Mark and Ludovic? Andy
bug-guile@HIDDEN
:bug#20823
; Package guile
.
Full text available.Received: (at 20823) by debbugs.gnu.org; 4 Mar 2016 23:24:46 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 04 18:24:46 2016 Received: from localhost ([127.0.0.1]:34185 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1abz5R-0005Ts-RA for submit <at> debbugs.gnu.org; Fri, 04 Mar 2016 18:24:45 -0500 Received: from river.fysh.org ([87.98.248.19]:49535 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from <zefram@HIDDEN>) id 1abz5Q-0005Tk-J9 for 20823 <at> debbugs.gnu.org; Fri, 04 Mar 2016 18:24:44 -0500 Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1abz5N-00045j-Gn; Fri, 04 Mar 2016 23:24:41 +0000 Date: Fri, 4 Mar 2016 23:24:41 +0000 From: Zefram <zefram@HIDDEN> To: 20823 <at> debbugs.gnu.org Subject: Re: bug#20823: argv mangled by locale Message-ID: <20160304232441.GB13009@HIDDEN> References: <20150616043300.GB2718@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150616043300.GB2718@HIDDEN> X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20823 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.0 (/) I wrote: >My comments about resolution in bug#20822 "environment mangled by locale" >mostly apply here too, The revised comments that I have just made on that ticket also apply here. Short version: "absence of setlocale" isn't a useful criterion, so explicit control of encoding will be necessary. -zefram
bug-guile@HIDDEN
:bug#20823
; Package guile
.
Full text available.Received: (at submit) by debbugs.gnu.org; 16 Jun 2015 04:33:18 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 16 00:33:18 2015 Received: from localhost ([127.0.0.1]:55116 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z4iYn-00029F-3i for submit <at> debbugs.gnu.org; Tue, 16 Jun 2015 00:33:17 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45613) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <zefram@HIDDEN>) id 1Z4iYk-000292-PS for submit <at> debbugs.gnu.org; Tue, 16 Jun 2015 00:33:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <zefram@HIDDEN>) id 1Z4iYe-0008J7-Bh for submit <at> debbugs.gnu.org; Tue, 16 Jun 2015 00:33:09 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:55244) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <zefram@HIDDEN>) id 1Z4iYe-0008J2-9Q for submit <at> debbugs.gnu.org; Tue, 16 Jun 2015 00:33:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59116) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <zefram@HIDDEN>) id 1Z4iYd-0006Ma-9l for bug-guile@HIDDEN; Tue, 16 Jun 2015 00:33:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <zefram@HIDDEN>) id 1Z4iYa-0008Hi-28 for bug-guile@HIDDEN; Tue, 16 Jun 2015 00:33:07 -0400 Received: from river.fysh.org ([5.135.154.127]:32978) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <zefram@HIDDEN>) id 1Z4iYZ-0008Hb-S3 for bug-guile@HIDDEN; Tue, 16 Jun 2015 00:33:03 -0400 Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1Z4iYW-0002hW-9q; Tue, 16 Jun 2015 05:33:00 +0100 Date: Tue, 16 Jun 2015 05:33:00 +0100 From: Zefram <zefram@HIDDEN> To: bug-guile@HIDDEN Subject: argv mangled by locale Message-ID: <20150616043300.GB2718@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) When guile-2.0 stores argv for later access via program-arguments, it sometimes decodes the underlying octet string according to the nominal character encoding of the locale suggested by the environment. This is a problem, because the arguments are not necessarily encoded that way, and may not even be encodings of character strings at all. The decoding is lossy, where the octet string isn't consistent with the character encoding, so the original octet string cannot be recovered from the mangled form. I don't see any Scheme interface that reliably retrieves the command line arguments without locale decoding. The decoding doesn't follow the usual rules for locale control. It is not at all sensitive to setlocale, which is understandable due to the arguments being acquired before any of the actual program's code runs. Empirically, if the environment nominates no locale, "POSIX", or a non-existent locale, then argv is decoded according to ISO-8859-1, thus preserving the octets. If the environment nominates an extant locale other than "POSIX", then argv is decoded according to that locale's nominal character encoding. Demos: $ env - guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 195 169 111 110) $ env - LANG=C guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 63 63 111 110) $ env - LANG=de_DE.utf8 guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 233 111 110) $ env - LANG=de_DE.iso88591 guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 195 169 111 110) The actual data passed between processes is an octet string, and there really needs to be some reliable way to access that octet string. My comments about resolution in bug#20822 "environment mangled by locale" mostly apply here too, with a slight change: it seems necessary to store the original octet strings and decode at the time program-arguments is called. With that change, the decoding can be responsive to setlocale (and in particular can reliably use ISO-8859-1 in the absence of setlocale). -zefram
Zefram <zefram@HIDDEN>
:bug-guile@HIDDEN
.
Full text available.bug-guile@HIDDEN
:bug#20823
; Package guile
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.