X-Loop: help-debbugs@HIDDEN Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Resent-From: Linas Vepstas <linasvepstas@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Sun, 08 Jan 2017 18:17:01 +0000 Resent-Message-ID: <handler.25397.B.148389941514309 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 25397 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-guile@HIDDEN Reply-To: linasvepstas@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.148389941514309 (code B ref -1); Sun, 08 Jan 2017 18:17:01 +0000 Received: (at submit) by debbugs.gnu.org; 8 Jan 2017 18:16:55 +0000 Received: from localhost ([127.0.0.1]:47062 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cQI1X-0003ij-5a for submit <at> debbugs.gnu.org; Sun, 08 Jan 2017 13:16:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50474) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <linasvepstas@HIDDEN>) id 1cQI1V-0003iW-Fa for submit <at> debbugs.gnu.org; Sun, 08 Jan 2017 13:16:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <linasvepstas@HIDDEN>) id 1cQI1P-0000os-5u for submit <at> debbugs.gnu.org; Sun, 08 Jan 2017 13:16:48 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50808) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <linasvepstas@HIDDEN>) id 1cQI1P-0000on-2W for submit <at> debbugs.gnu.org; Sun, 08 Jan 2017 13:16:47 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41444) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <linasvepstas@HIDDEN>) id 1cQI1N-0000Cl-Ru for bug-guile@HIDDEN; Sun, 08 Jan 2017 13:16:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <linasvepstas@HIDDEN>) id 1cQI1M-0000oL-Pj for bug-guile@HIDDEN; Sun, 08 Jan 2017 13:16:45 -0500 Received: from mail-qt0-x22b.google.com ([2607:f8b0:400d:c0d::22b]:34420) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <linasvepstas@HIDDEN>) id 1cQI1M-0000oF-LU for bug-guile@HIDDEN; Sun, 08 Jan 2017 13:16:44 -0500 Received: by mail-qt0-x22b.google.com with SMTP id l7so70341608qtd.1 for <bug-guile@HIDDEN>; Sun, 08 Jan 2017 10:16:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=JA5bE9eQ9LlQDOyg8QRZUr1IpybPbykncbab2LVibeY=; b=sM/A2M7cjVSCejNvFhem+KE3s3lvgc1L33pq+weaMFOakoBz10QcSMkzl4GD/ok5Yx Huvtv13zZwkeH4t6TbqN3tj8UTseHjyjwP/s06buayKpQI1cx+aWiVaLEK6MllBfFcTw OwV5QriZK54oXteSQQga0aTL0+Yf4DLokyAudyxz3He0FDpMXWrFQUmeR1SDHYk50hBL rN4vTIHGnvTJ5+Md4exkk7Dlc8DJ4hXBloOhsOujU7hc252L6O/wTz3hU3giBwjPNA0C K+V+aFUcru7mO16ZwExTKRinMBqfv3qBaLvtt5B1NEFPbnS/WRPGkGf37d3K9iDriKr2 ytUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to:content-transfer-encoding; bh=JA5bE9eQ9LlQDOyg8QRZUr1IpybPbykncbab2LVibeY=; b=QjsiVT9EO9p8j6xYYEtxAnde6IIgdEDX5c22X5BcRirm3kxjR3jz4LidzlRR83G5/q NH9qC/CNU3uOVLXj+3f9ISfS/0CY2p6sbwB8wQ2a1EXj5etitgFe4gUczx9EvRwLWA9H Ec1Vu/oLeVlgsrH/5II7EEdB/dRCnhMiJaMG9K4gYXUTP2oGLXrzV6gKBgX8ChDEZJ1N XuaI65LDEOk1MX2ya9t4QhLhIvC2degoQyTD0pa1Ca3ZlFmdcjLB7JPfvrMnAIzm5UJo RLyY+fk5N4rJSgRL2UYy2Vu8wa8Tt5zIea37QFQwVZiufbd6yijOLzudL7NvVkyqLT08 vlzw== X-Gm-Message-State: AIkVDXLgCk+MzHwdhaVA4BX3KYKXpMJaOQ6ai+RKAhYrZ8tRgxB7ZaXGiXy2un6i7wkRJj9Zny00qDsy30MCug== X-Received: by 10.237.52.37 with SMTP id w34mr18201493qtd.173.1483899404089; Sun, 08 Jan 2017 10:16:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.128.78 with HTTP; Sun, 8 Jan 2017 10:16:23 -0800 (PST) From: Linas Vepstas <linasvepstas@HIDDEN> Date: Sun, 8 Jan 2017 12:16:23 -0600 Message-ID: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) There appears to be a regression in guile-2.2 with utf8 handling in the scm_puts() scm_lfwrite() and scm_c_put_string() functions. In guile-2.0, one could give these utf8-encoded strings, and these would display just fine. In 2.2 they get mangled. The source of the mangling seems to be an assumption that these three are being given latin1 strings, which they then attempt to convert to utf8, thus wrecking the encoding. See, e.g. libguile/ports.c line 3526 Presumably this change was intentional, but I don't understand why; guile-2.0 seems utf-8 clean, correctly handling utf-8 in essentially all cases. Why would one want to go back to the bad old days of latin1 and iso-8859-1 for guile 2.2? I could submit a patch for this, but would it be wanted? Test case is straight-forward: printf("duuude port-encoding is=3D%s\n", scm_to_utf8_string(scm_port_encoding(scm_current_output_port ()))); scm_puts ("=E4=BF=82 =E6=8B=89 =E4=B8=81 =E5=AD=97 =E6=AF=8D", scm_current_= output_port ()); which works in guile-2.0 but is garbled in 2.2
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: linasvepstas@HIDDEN Subject: bug#25397: Acknowledgement (guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string) Message-ID: <handler.25397.B.148389941514309.ack <at> debbugs.gnu.org> References: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> X-Gnu-PR-Message: ack 25397 X-Gnu-PR-Package: guile Reply-To: 25397 <at> debbugs.gnu.org Date: Sun, 08 Jan 2017 18:17:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-guile@HIDDEN If you wish to submit further information on this problem, please send it to 25397 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 25397: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D25397 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Resent-From: Andy Wingo <wingo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 09 Jan 2017 22:04:02 +0000 Resent-Message-ID: <handler.25397.B25397.148399941930248 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Linas Vepstas <linasvepstas@HIDDEN> Cc: 25397 <at> debbugs.gnu.org Received: via spool by 25397-submit <at> debbugs.gnu.org id=B25397.148399941930248 (code B ref 25397); Mon, 09 Jan 2017 22:04:02 +0000 Received: (at 25397) by debbugs.gnu.org; 9 Jan 2017 22:03:39 +0000 Received: from localhost ([127.0.0.1]:48208 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cQi2V-0007ro-8d for submit <at> debbugs.gnu.org; Mon, 09 Jan 2017 17:03:39 -0500 Received: from pb-sasl2.pobox.com ([64.147.108.67]:56364 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1cQi2T-0007rf-FA for 25397 <at> debbugs.gnu.org; Mon, 09 Jan 2017 17:03:37 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 51B6357C9E; Mon, 9 Jan 2017 17:03:36 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=LS0WcsMw+G9U6TBXj7PdaW1TbPo=; b=TQfoYH 0tUaanzgM3bdkvx3NntCAU49vcz/v4qwAHrjakg/oRTg4Yy9bk5Qp4Rko1X801io J3qZgo6ncZ2HBlnrMKrqB/cipk0d1bry6ayYFYAYN4UiEvA9LsXXGm746vGDx5/E SmFjwrvYr4ni/VQK2+/MDeZmRAvSkwV4xPMD0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=pVhD4yulxrRsfES5/FnNesvUOUQSPjL/ wAW+grE1KgYGDhWtT3UJUgFVW04VZmCiNZIVOCGQbsRZtcNDrpFZMyEENtQLn5cj QZGaEanMED7RZZqZqgRAuZRkd9BqJcH+Q8NolHCjKlC9xfQInS2YW3PRqET/6AjO ME6bBUaMiks= Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 49F9257C9D; Mon, 9 Jan 2017 17:03:36 -0500 (EST) Received: from clucks (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 3E2A357C9C; Mon, 9 Jan 2017 17:03:35 -0500 (EST) From: Andy Wingo <wingo@HIDDEN> References: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> Date: Mon, 09 Jan 2017 23:03:27 +0100 In-Reply-To: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> (Linas Vepstas's message of "Sun, 8 Jan 2017 12:16:23 -0600") Message-ID: <87y3yj99hs.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 77B21C20-D6B7-11E6-A179-6141F2301B6D-02397024!pb-sasl2.pobox.com X-Spam-Score: -3.2 (---) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.2 (---) On Sun 08 Jan 2017 19:16, Linas Vepstas <linasvepstas@HIDDEN> writes: > There appears to be a regression in guile-2.2 with utf8 handling > in the scm_puts() scm_lfwrite() and scm_c_put_string() functions. > > In guile-2.0, one could give these utf8-encoded strings, and these > would display just fine. In 2.2 they get mangled. Could it be this from NEWS: ** Better locale support in Guile scripts When Guile is invoked directly, either from the command line or via a hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current locale via a call to `(setlocale LC_ALL "")'. For users with a unicode locale, this makes all ports unicode-capable by default, without the need to call `setlocale' in your program. This behavior may be controlled via the GUILE_INSTALL_LOCALE environment variable; see the manual for more.
X-Loop: help-debbugs@HIDDEN Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Resent-From: Linas Vepstas <linasvepstas@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Tue, 10 Jan 2017 03:36:02 +0000 Resent-Message-ID: <handler.25397.B25397.14840193045689 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Andy Wingo <wingo@HIDDEN> Cc: 25397 <at> debbugs.gnu.org Reply-To: linasvepstas@HIDDEN Received: via spool by 25397-submit <at> debbugs.gnu.org id=B25397.14840193045689 (code B ref 25397); Tue, 10 Jan 2017 03:36:02 +0000 Received: (at 25397) by debbugs.gnu.org; 10 Jan 2017 03:35:04 +0000 Received: from localhost ([127.0.0.1]:48307 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cQnDE-0001Th-Bx for submit <at> debbugs.gnu.org; Mon, 09 Jan 2017 22:35:04 -0500 Received: from mail-qk0-f172.google.com ([209.85.220.172]:33832) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <linasvepstas@HIDDEN>) id 1cQnDC-0001T9-6f for 25397 <at> debbugs.gnu.org; Mon, 09 Jan 2017 22:35:02 -0500 Received: by mail-qk0-f172.google.com with SMTP id a20so162908198qkc.1 for <25397 <at> debbugs.gnu.org>; Mon, 09 Jan 2017 19:35:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=V81j9Jzazwb6lGErkRCPRfrl4olTX0005hGcfV052XM=; b=k54uXqjKXl5zXzqbPP13xZbqrUWTlm5KpGxwV0gSlvvbVSF2hcJGJTcrnfV4PprYNP R1is9LlKgWDRdwkHvNei+JLLJEJGyVuH8yR7A73MwOHmhFCC4a5HgZFN+Pj0QODGxBEB 7diO93tbq3zQUNP7qCMousEixTMvaFHqG4p8yPoSnDS6PrdVN+ce79IK/KOxjFxyFfKe vJ5yWNcQXj4mwTOAjvJAL3iuPnI5aGRaJKtcsDvX83xJ2ldMPAiZRHlmOzzpdAe+wwPX 5Vi93ME6K+ky2N2EGdI71jo2m2E1zLQ/jzSSu9jSvBiGNdGtdq65EkqeuEv2luzQdr74 r3WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc:content-transfer-encoding; bh=V81j9Jzazwb6lGErkRCPRfrl4olTX0005hGcfV052XM=; b=VQMRqX6JYAeeF5jA5HTt4QjuYxxY8g/EZBnJoKNBdFoe9OGG1SFE7ug+vNIbX9ekvt qxjoPTTtntJVYZYuJuT7ZMB1LoE29wmsr3vWZg8UcWgNtrl8iPhqQbvEarqq5MZlR5L4 9mJhLxuM/qSgoa28L8bMgtwAYIkKq3NnGEoWdhff2K1164zkDNCAM8S/E9arMBz8hh7l slK2ptB2yNV+u7GKfI/j9bKrQ7WRIyq6vpiBdeHITtuWRFDyGjxO9sO+f+rasQzmRYMI I011zvQHhBYJ5OZIiox3pUdC36y92OVmlo/pvxjNpeSslpm9ctntGgG+yC3GtQWhBsbU rh7A== X-Gm-Message-State: AIkVDXLzJSvr0evDa3ecHPxZKGhAWhu2coQwS7+n4434MKfkEk2hYaGQERPzZZHUBNIwVBmRB3hxiS11tL+ifA== X-Received: by 10.55.114.70 with SMTP id n67mr935130qkc.185.1484019296670; Mon, 09 Jan 2017 19:34:56 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.128.78 with HTTP; Mon, 9 Jan 2017 19:34:36 -0800 (PST) In-Reply-To: <87y3yj99hs.fsf@HIDDEN> References: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> <87y3yj99hs.fsf@HIDDEN> From: Linas Vepstas <linasvepstas@HIDDEN> Date: Mon, 9 Jan 2017 21:34:36 -0600 Message-ID: <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@HIDDEN> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.3 (-) This short C program illustrates the issue. The locale, the output port et= c. are UTF-8. The bad results are no surprise: the code currently in git for scm_puts etc. explicitly ignores the locale setting, always, and always assumes latin1 -- its hard-coded in there. --linas #include <libguile.h> void *wrap_eval(void* p) { char *wtf =3D "(setlocale LC_ALL \"\")"; SCM eval_str =3D scm_from_utf8_string(wtf); scm_eval_string(eval_str); return NULL; } void *wrap_puts(void* p) { char *wtf =3D p; SCM port =3D scm_current_output_port (); scm_puts("the port-encoding is=3D", port); scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); scm_puts("\nThe string to display is =3D", port); scm_puts (wtf, port); scm_puts("\nWas expecting to see this=3D", port); SCM str =3D scm_from_utf8_string(wtf); scm_display(str, port); scm_puts("\n\n", port); return NULL; } int main(int argc, char* argv[]) { scm_with_guile(wrap_eval, 0x0); char * wtf =3D "=C4=86i=C4=87olina"; scm_with_guile(wrap_puts, wtf); wtf =3D "Th=E1=BB=A7 D=E1=BA=A7u M=E1=BB=99t"; scm_with_guile(wrap_puts, wtf); wtf =3D "Sm=C3=A5land"; scm_with_guile(wrap_puts, wtf); wtf =3D "H=C3=B2a Ph=C3=BA Ph=C3=BA T=C3=A2n"; scm_with_guile(wrap_puts, wtf); wtf =3D "=E4=BF=82 =E6=8B=89 =E4=B8=81 =E5=AD=97 =E6=AF=8D"; scm_with_guile(wrap_puts, wtf); } The output is always this: the port-encoding is=3DUTF-8 The string to display is =3D=C3=84=E2=80=A0i=C3=84=E2=80=A1olina Was expecting to see this=3D=C4=86i=C4=87olina the port-encoding is=3DUTF-8 The string to display is =3DTh=C3=A1=C2=BB=C2=A7 D=C3=A1=C2=BA=C2=A7u M=C3= =A1=C2=BB=E2=84=A2t Was expecting to see this=3DTh=E1=BB=A7 D=E1=BA=A7u M=E1=BB=99t the port-encoding is=3DUTF-8 The string to display is =3DSm=C3=83=C2=A5land Was expecting to see this=3DSm=C3=A5land the port-encoding is=3DUTF-8 The string to display is =3DH=C3=83=C2=B2a Ph=C3=83=C2=BA Ph=C3=83=C2=BA T= =C3=83=C2=A2n Was expecting to see this=3DH=C3=B2a Ph=C3=BA Ph=C3=BA T=C3=A2n the port-encoding is=3DUTF-8 Was expecting to see this=3D=E4=BF=82 =E6=8B=89 =E4=B8=81 =E5=AD=97 =E6=AF= =8D =C3=A6=C2=AF What's cool is that all this stuff works in email! --linas On Mon, Jan 9, 2017 at 4:03 PM, Andy Wingo <wingo@HIDDEN> wrote: > On Sun 08 Jan 2017 19:16, Linas Vepstas <linasvepstas@HIDDEN> writes: > >> There appears to be a regression in guile-2.2 with utf8 handling >> in the scm_puts() scm_lfwrite() and scm_c_put_string() functions. >> >> In guile-2.0, one could give these utf8-encoded strings, and these >> would display just fine. In 2.2 they get mangled. > > Could it be this from NEWS: > > ** Better locale support in Guile scripts > > When Guile is invoked directly, either from the command line or via a > hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current > locale via a call to `(setlocale LC_ALL "")'. For users with a unicode > locale, this makes all ports unicode-capable by default, without the > need to call `setlocale' in your program. This behavior may be > controlled via the GUILE_INSTALL_LOCALE environment variable; see the > manual for more.
X-Loop: help-debbugs@HIDDEN Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Resent-From: Andy Wingo <wingo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Wed, 01 Mar 2017 15:46:02 +0000 Resent-Message-ID: <handler.25397.B25397.148838313610808 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Linas Vepstas <linasvepstas@HIDDEN> Cc: 25397 <at> debbugs.gnu.org Received: via spool by 25397-submit <at> debbugs.gnu.org id=B25397.148838313610808 (code B ref 25397); Wed, 01 Mar 2017 15:46:02 +0000 Received: (at 25397) by debbugs.gnu.org; 1 Mar 2017 15:45:36 +0000 Received: from localhost ([127.0.0.1]:34559 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cj6Rc-0002oF-I9 for submit <at> debbugs.gnu.org; Wed, 01 Mar 2017 10:45:36 -0500 Received: from pb-sasl1.pobox.com ([64.147.108.66]:54548 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1cj6Rb-0002o8-1l for 25397 <at> debbugs.gnu.org; Wed, 01 Mar 2017 10:45:35 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 82A775EF3F; Wed, 1 Mar 2017 10:45:34 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=BHvsydYzlnzrH9h4nCcTqf1Fob8=; b=KElZyb AMqvI2ZdPHcUFzNEE0WAEoJGPYYeTXfzrGbSI/7WNKD4tFW7FRiBLPdI3Ab0Frky k7rT+fAU2H1YySOdtdqZrATrYCInh8t2gAe50Ce1BWOrmdHB3DZD1fxNcZVi9/I2 qzVSFiNvfHYpsaw4WHnH309x2eGKlADMAVMns= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=Hx/mH/Eh+aFXXiyYdUtsMRW2PJLEbusa 1bEjRRkFVHRgUrWS3tN2H3PZrvf0ixyNudHHcYTdsz3B96dk31Up9HMpax1+8Khj kMii5pEogKsxKpPhWXdXOHsFfSXEN++IHDzhiXlD3c2kCHyHNdrD1qsuyiFDODWX uLjxKRQLMhc= Received: from pb-sasl1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 7389D5EF3D; Wed, 1 Mar 2017 10:45:34 -0500 (EST) Received: from clucks (unknown [109.190.228.233]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl1.pobox.com (Postfix) with ESMTPSA id 57B0C5EF3A; Wed, 1 Mar 2017 10:45:33 -0500 (EST) From: Andy Wingo <wingo@HIDDEN> References: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> <87y3yj99hs.fsf@HIDDEN> <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@HIDDEN> Date: Wed, 01 Mar 2017 16:45:26 +0100 In-Reply-To: <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@HIDDEN> (Linas Vepstas's message of "Mon, 9 Jan 2017 21:34:36 -0600") Message-ID: <87y3wpdmqx.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 1B2B23E6-FE96-11E6-8A12-B667064AB293-02397024!pb-sasl1.pobox.com X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) On Tue 10 Jan 2017 04:34, Linas Vepstas <linasvepstas@HIDDEN> writes: > void *wrap_puts(void* p) > { > char *wtf = p; > > SCM port = scm_current_output_port (); > > scm_puts("the port-encoding is=", port); > scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); > > scm_puts("\nThe string to display is =", port); > scm_puts (wtf, port); > > scm_puts("\nWas expecting to see this=", port); > SCM str = scm_from_utf8_string(wtf); > scm_display(str, port); > scm_puts("\n\n", port); > > return NULL; > } So, there are a few questions here. scm_puts and scm_lfwrite are not documented, so we need to do basic science on them to see what they are supposed to do. Firstly, is scm_puts() a textual interface or a binary interface? I.e. does it write a sequence of characters or a sequence of bytes? If I look at uses of scm_puts in Guile sources, it seems clear that it's a textual interface. That is to say, at all points, the intention seems to be to write characters on a Guile port. All of the uses are of strings. Please do a "git grep" on your source to see if your perceptions correspond. Now the question is, what encoding is the argument in? If the port is UTF-16, that byte string should be decoded to characters, and that character sequence encoded to UTF-16. All of the scm_puts calls in Guile are of one-byte characters with codepoints less than 128, so when doing some port refactoring I chose to interpret the argument as latin1. FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for the purposes of updating line and column, but scm_puts and scm_lfwrite just wrote out the bytes to the port directly, regardless of the encoding. That was the wrong thing. Are you arguing that the byte string given to scm_puts should be decoded from UTF-8? That would be OK. Andy
X-Loop: help-debbugs@HIDDEN Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Resent-From: Linas Vepstas <linasvepstas@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Wed, 01 Mar 2017 20:20:02 +0000 Resent-Message-ID: <handler.25397.B25397.14883995433563 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Andy Wingo <wingo@HIDDEN> Cc: "25397 <at> debbugs.gnu.org" <25397 <at> debbugs.gnu.org> Reply-To: linasvepstas@HIDDEN Received: via spool by 25397-submit <at> debbugs.gnu.org id=B25397.14883995433563 (code B ref 25397); Wed, 01 Mar 2017 20:20:02 +0000 Received: (at 25397) by debbugs.gnu.org; 1 Mar 2017 20:19:03 +0000 Received: from localhost ([127.0.0.1]:34870 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cjAiE-0000vO-K7 for submit <at> debbugs.gnu.org; Wed, 01 Mar 2017 15:19:02 -0500 Received: from mail-qk0-f181.google.com ([209.85.220.181]:35687) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <linasvepstas@HIDDEN>) id 1cjAiC-0000uu-Vs for 25397 <at> debbugs.gnu.org; Wed, 01 Mar 2017 15:19:01 -0500 Received: by mail-qk0-f181.google.com with SMTP id u188so89817965qkc.2 for <25397 <at> debbugs.gnu.org>; Wed, 01 Mar 2017 12:19:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc; bh=yOm5zg0CNT2fR2AlKikOmNLAkOrT/1jA4uyaZFEqRu4=; b=hlw7REaUZ57DzIJKxMVWxA7VwUV5YpaNg8HykVU6/dk0RjxY+JGj2f0AbjpyADpWwG wMKHp/1FjLe/wi+fT/H+tGhW9Ik8/vcP98xHMpxuaFmlIJyJpCzUe15u6hiFlncGEqS/ soeQzl9rZeGgYGFLzeQMJRoIjVTl7s/j27tKRtqxlJG4hRqQ9Udin7zDyKvYRG1Gb9zg PTbX2aEnY1dJeKIi2qzcLp3TP6zmttiQt9PP0+J8c+0jW9IvUInqZ37HyAYDeBWDHweR THcy3RFoIk9QSOO31WZGHxIV4OGkKKepLV7kigOQl1mt2GZFnQF74wMPFjzkEHJb9Ec1 rdEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=yOm5zg0CNT2fR2AlKikOmNLAkOrT/1jA4uyaZFEqRu4=; b=AGA1W/73mOLmDLTrFv4rOUFdIm0C54BgftbmpHYkZ2xofCpCmw3+i14rP1UrMoQVAY MCidfDqyamjEIM/zuYG0cpOWWn1OJIv8kvEr5/6cRV/v7QY5uYU2tCSlejf451i1Y3X6 HwZQlLE13lZiy0LPxO4o7lEkf14KRLjbguuRbhxwyvoEeJyfV2rx2I7cQ3eFgzKHkjiV TgeRLIhkyuC8sXIAHsCQbtIM0esG3J/p/Kv/CYe2gbLheJFZGSBcZTepZuXUn0qpwRuY zEpIIHtYdsj01UflfVCAmV7njCfcxOmY0kFXQliJGilB+Iw+jT/KfZRKL2M9jnrLosvc cOWg== X-Gm-Message-State: AMke39kp8zrfLM68lJQM81QvRkfArU1xFanJA1piWZCHPv56ZoS5y0JgwTZnHS097MT1OBdhbT5kHz0yVyMcqg== X-Received: by 10.55.131.4 with SMTP id f4mr11959349qkd.1.1488399535351; Wed, 01 Mar 2017 12:18:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.174.231 with HTTP; Wed, 1 Mar 2017 12:18:54 -0800 (PST) In-Reply-To: <87y3wpdmqx.fsf@HIDDEN> References: <CAHrUA35wPNE0JgCDonQTk_z=ZNijWJWzKHzu+pdSMYF-3-1_zg@HIDDEN> <87y3yj99hs.fsf@HIDDEN> <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@HIDDEN> <87y3wpdmqx.fsf@HIDDEN> From: Linas Vepstas <linasvepstas@HIDDEN> Date: Wed, 1 Mar 2017 14:18:54 -0600 Message-ID: <CAHrUA35TFqxtZuJ23huU1=5FOE-_Cr55R1HJHXjNrcqCtqQMGA@HIDDEN> Content-Type: multipart/alternative; boundary=94eb2c071e9a96ad5f0549b105a9 X-Spam-Score: 0.5 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.5 (/) --94eb2c071e9a96ad5f0549b105a9 Content-Type: text/plain; charset=UTF-8 In the bad old days, not every thing was documented ... My use of scm_puts dates back to guile-1.8. I only ever send it utf8. I can change my code, no problem,... I just thought I'd report a regression in case .... others are affected. Linas On Wednesday, March 1, 2017, Andy Wingo <wingo@HIDDEN> wrote: > On Tue 10 Jan 2017 04:34, Linas Vepstas <linasvepstas@HIDDEN > <javascript:;>> writes: > > > void *wrap_puts(void* p) > > { > > char *wtf = p; > > > > SCM port = scm_current_output_port (); > > > > scm_puts("the port-encoding is=", port); > > scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); > > > > scm_puts("\nThe string to display is =", port); > > scm_puts (wtf, port); > > > > scm_puts("\nWas expecting to see this=", port); > > SCM str = scm_from_utf8_string(wtf); > > scm_display(str, port); > > scm_puts("\n\n", port); > > > > return NULL; > > } > > So, there are a few questions here. scm_puts and scm_lfwrite are not > documented, so we need to do basic science on them to see what they are > supposed to do. > > Firstly, is scm_puts() a textual interface or a binary interface? > I.e. does it write a sequence of characters or a sequence of bytes? > > If I look at uses of scm_puts in Guile sources, it seems clear that it's > a textual interface. That is to say, at all points, the intention seems > to be to write characters on a Guile port. All of the uses are of > strings. Please do a "git grep" on your source to see if your > perceptions correspond. > > Now the question is, what encoding is the argument in? If the port is > UTF-16, that byte string should be decoded to characters, and that > character sequence encoded to UTF-16. > > All of the scm_puts calls in Guile are of one-byte characters with > codepoints less than 128, so when doing some port refactoring I chose to > interpret the argument as latin1. > > FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's > scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for > the purposes of updating line and column, but scm_puts and scm_lfwrite > just wrote out the bytes to the port directly, regardless of the > encoding. That was the wrong thing. > > Are you arguing that the byte string given to scm_puts should be decoded > from UTF-8? That would be OK. > > Andy > --94eb2c071e9a96ad5f0549b105a9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In the bad old days, not every thing was documented ... My use of scm_puts = dates back to guile-1.8.=C2=A0 I only ever send it utf8.=C2=A0 I can change= my code, no problem,... I just thought I'd report a regression in case= .... others are affected.<div><br></div><div>Linas<br><br>On Wednesday, Ma= rch 1, 2017, Andy Wingo <<a href=3D"mailto:wingo@HIDDEN">wingo@pobox.= com</a>> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0= 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue 10 Jan 2017 04:= 34, Linas Vepstas <<a href=3D"javascript:;" onclick=3D"_e(event, 'cv= ml', 'linasvepstas@HIDDEN')">linasvepstas@HIDDEN</a>> = writes:<br> <br> > void *wrap_puts(void* p)<br> > {<br> >=C2=A0 =C2=A0 char *wtf =3D p;<br> ><br> >=C2=A0 =C2=A0 SCM port =3D scm_current_output_port ();<br> ><br> >=C2=A0 =C2=A0 scm_puts("the port-encoding is=3D", port);<br> >=C2=A0 =C2=A0 scm_puts(scm_to_utf8_string(<wbr>scm_port_encoding(port))= , port);<br> ><br> >=C2=A0 =C2=A0 scm_puts("\nThe string to display is =3D", port= );<br> >=C2=A0 =C2=A0 scm_puts (wtf, port);<br> ><br> >=C2=A0 =C2=A0 scm_puts("\nWas expecting to see this=3D", port= );<br> >=C2=A0 =C2=A0 SCM str =3D scm_from_utf8_string(wtf);<br> >=C2=A0 =C2=A0 scm_display(str, port);<br> >=C2=A0 =C2=A0 scm_puts("\n\n", port);<br> ><br> >=C2=A0 =C2=A0 return NULL;<br> > }<br> <br> So, there are a few questions here.=C2=A0 scm_puts and scm_lfwrite are not<= br> documented, so we need to do basic science on them to see what they are<br> supposed to do.<br> <br> Firstly, is scm_puts() a textual interface or a binary interface?<br> I.e. does it write a sequence of characters or a sequence of bytes?<br> <br> If I look at uses of scm_puts in Guile sources, it seems clear that it'= s<br> a textual interface.=C2=A0 That is to say, at all points, the intention see= ms<br> to be to write characters on a Guile port.=C2=A0 All of the uses are of<br> strings.=C2=A0 Please do a "git grep" on your source to see if yo= ur<br> perceptions correspond.<br> <br> Now the question is, what encoding is the argument in?=C2=A0 If the port is= <br> UTF-16, that byte string should be decoded to characters, and that<br> character sequence encoded to UTF-16.<br> <br> All of the scm_puts calls in Guile are of one-byte characters with<br> codepoints less than 128, so when doing some port refactoring I chose to<br= > interpret the argument as latin1.<br> <br> FTR, in Guile 2.0, this was effectively a binary interface.=C2=A0 Guile 2.0= 's<br> scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for<br> the purposes of updating line and column, but scm_puts and scm_lfwrite<br> just wrote out the bytes to the port directly, regardless of the<br> encoding.=C2=A0 That was the wrong thing.<br> <br> Are you arguing that the byte string given to scm_puts should be decoded<br= > from UTF-8?=C2=A0 That would be OK.<br> <br> Andy<br> </blockquote></div> --94eb2c071e9a96ad5f0549b105a9--
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.