X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: taylanbayirli@HIDDEN ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Sat, 11 Mar 2017 12:14:01 +0000 Resent-Message-ID: <handler.26058.B.148923440930181 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 26058 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-guile@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.148923440930181 (code B ref -1); Sat, 11 Mar 2017 12:14:01 +0000 Received: (at submit) by debbugs.gnu.org; 11 Mar 2017 12:13:29 +0000 Received: from localhost ([127.0.0.1]:50358 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cmftp-0007qj-9U for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:29 -0500 Received: from eggs.gnu.org ([208.118.235.92]:55512) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <taylanbayirli@HIDDEN>) id 1cmfto-0007qX-1m for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>) id 1cmfth-0007kc-Ky for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:22 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:34239) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>) id 1cmfth-0007kX-Hy for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:21 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>) id 1cmftg-0000FE-Ao for bug-guile@HIDDEN; Sat, 11 Mar 2017 07:13:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>) id 1cmftf-0007iA-7V for bug-guile@HIDDEN; Sat, 11 Mar 2017 07:13:20 -0500 Received: from mail-wr0-x234.google.com ([2a00:1450:400c:c0c::234]:35738) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>) id 1cmftf-0007gz-0l for bug-guile@HIDDEN; Sat, 11 Mar 2017 07:13:19 -0500 Received: by mail-wr0-x234.google.com with SMTP id g10so78931385wrg.2 for <bug-guile@HIDDEN>; Sat, 11 Mar 2017 04:13:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:message-id:from:to:subject:mime-version :content-transfer-encoding; bh=m9U8N1JQh4Fx8RBEIkzEelu/URJqTEIohIBFrGLEFqk=; b=VUX3ogASgCy6N+POTpau6n/eJ3x0WxtZORO9b2zoTvgyyKwgQXdmyrDWAX6N1Uog7R 4+wjDA7TG6AVbEH3NamKFA9TmC8WXbrDAso3e3w1yy0ITAiLPsSpIAwo/hcGc5ldVdH3 RwyX6Ljl6xyQdOb2snrfkacyBbSXOPKq2ynSM9nnyHE0Z0ln4efmtigmD4LitEqbOKNp bJWMyBkWygSOQ0DRadr/eTUsDa39VS5EXuTHo3a1VzkhYu56pXJobGvXafobzKdg86rz nAmroiTeBIIVc+kP6QPFVsSwJOmWoMaRCRMcve/XUoKFlrSlHTjF8bL2W0KXSEeKMtmi bWSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:from:to:subject:mime-version :content-transfer-encoding; bh=m9U8N1JQh4Fx8RBEIkzEelu/URJqTEIohIBFrGLEFqk=; b=bToNU7aqY0dpw7OrdrYwHqlmxJTOC8oMszev0bzUCT97lOLDN3Jf0BCgLx5LBRK5by 8ekMkA1xnemprWNlrty+KNVedjjJmoOsn2EuHcZ22krDJSaYihXE7WD21tgcjuRUS1sh WfuShcF82SMMpU9FpyvqDuir4wiYcg0Phk8T4Yy0sqnGC5XndbFz5Xgp1VkxaFMO8SiX ABWumPZlMjlnukq6vJJi4CE6wuOEGjMcfSq9GrQfEP4/MynEgqQQsDFlyV93AJg7Nhoe VD4LNIegkJqyg+xOGZmSdtKmgpA6+Rp/0wiOSAZu5Ew6mhb9CX+2XMyrBus22ghl9bbq lwBg== X-Gm-Message-State: AMke39mQ6SVRrIM2zksZBIohMlovT88jWsNeShsWzFpcNCkMxNEj0L4ovFiAhMGJ6FYmnQ== X-Received: by 10.223.128.5 with SMTP id 5mr19465736wrk.163.1489234397837; Sat, 11 Mar 2017 04:13:17 -0800 (PST) Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0]) by smtp.gmail.com with ESMTPSA id d42sm17161980wrd.37.2017.03.11.04.13.17 for <bug-guile@HIDDEN> (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 11 Mar 2017 04:13:17 -0800 (PST) Date: Sat, 11 Mar 2017 13:19:44 +0100 Message-Id: <87o9x83t0f.fsf@HIDDEN> From: taylanbayirli@HIDDEN ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) See the R6RS Libraries document page 10. The differences: - R6RS supports reading a BOM. - R6RS mandates an endianness argument to specify the behavior at the absence of a BOM. - R6RS allows an optional third argument 'endianness-mandatory' to explicitly ignore any possible BOM. Here's a quick patch on top of master. I didn't test it thoroughly... ===File /home/taylan/src/guile/guile-master/0001-Fix-R6RS-utf16-string-and-utf32-string.patch=== From f51cd1d4884caafb1ed0072cd77c0e3145f34576 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?= <taylanbayirli@HIDDEN> Date: Fri, 10 Mar 2017 22:36:55 +0100 Subject: [PATCH] Fix R6RS utf16->string and utf32->string. * module/rnrs/bytevectors.scm (read-bom16, read-bom32): New procedures. (r6rs-utf16->string, r6rs-utf32->string): Ditto. --- module/rnrs/bytevectors.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm index 9744359f0..997a8c9cb 100644 --- a/module/rnrs/bytevectors.scm +++ b/module/rnrs/bytevectors.scm @@ -69,7 +69,9 @@ bytevector-ieee-double-native-set! string->utf8 string->utf16 string->utf32 - utf8->string utf16->string utf32->string)) + utf8->string + (r6rs-utf16->string . utf16->string) + (r6rs-utf32->string . utf32->string))) (load-extension (string-append "libguile-" (effective-version)) @@ -80,4 +82,52 @@ `(quote ,sym) (error "unsupported endianness" sym))) +(define (read-bom16 bv) + (let ((c0 (bytevector-u8-ref bv 0)) + (c1 (bytevector-u8-ref bv 1))) + (cond + ((and (= c0 #xFE) (= c1 #xFF)) + 'big) + ((and (= c0 #xFF) (= c1 #xFE)) + 'little) + (else + #f)))) + +(define r6rs-utf16->string + (case-lambda + ((bv default-endianness) + (let ((bom-endianness (read-bom16 bv))) + (if (not bom-endianness) + (utf16->string bv default-endianness) + (substring/shared (utf16->string bv bom-endianness) 1)))) + ((bv endianness endianness-mandatory?) + (if endianness-mandatory? + (utf16->string bv endianness) + (r6rs-utf16->string bv endianness))))) + +(define (read-bom32 bv) + (let ((c0 (bytevector-u8-ref bv 0)) + (c1 (bytevector-u8-ref bv 1)) + (c2 (bytevector-u8-ref bv 2)) + (c3 (bytevector-u8-ref bv 3))) + (cond + ((and (= c0 #x00) (= c1 #x00) (= c2 #xFE) (= c3 #xFF)) + 'big) + ((and (= c0 #xFF) (= c1 #xFE) (= c2 #x00) (= c3 #x00)) + 'little) + (else + #f)))) + +(define r6rs-utf32->string + (case-lambda + ((bv default-endianness) + (let ((bom-endianness (read-bom32 bv))) + (if (not bom-endianness) + (utf32->string bv default-endianness) + (substring/shared (utf32->string bv bom-endianness) 1)))) + ((bv endianness endianness-mandatory?) + (if endianness-mandatory? + (utf32->string bv endianness) + (r6rs-utf32->string bv endianness))))) + ;;; bytevector.scm ends here -- 2.11.0 ============================================================
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: taylanbayirli@HIDDEN ("Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Subject: bug#26058: Acknowledgement (utf16->string and utf32->string don't conform to R6RS) Message-ID: <handler.26058.B.148923440930181.ack <at> debbugs.gnu.org> References: <87o9x83t0f.fsf@HIDDEN> X-Gnu-PR-Message: ack 26058 X-Gnu-PR-Package: guile Reply-To: 26058 <at> debbugs.gnu.org Date: Sat, 11 Mar 2017 12:14:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-guile@HIDDEN If you wish to submit further information on this problem, please send it to 26058 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 26058: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D26058 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: Andy Wingo <wingo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 13 Mar 2017 13:04:01 +0000 Resent-Message-ID: <handler.26058.B26058.14894102057836 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: taylanbayirli@HIDDEN ("Taylan Ulrich "=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?="") Cc: 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.14894102057836 (code B ref 26058); Mon, 13 Mar 2017 13:04:01 +0000 Received: (at 26058) by debbugs.gnu.org; 13 Mar 2017 13:03:25 +0000 Received: from localhost ([127.0.0.1]:53312 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cnPdF-00022J-LO for submit <at> debbugs.gnu.org; Mon, 13 Mar 2017 09:03:25 -0400 Received: from pb-sasl2.pobox.com ([64.147.108.67]:50494 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1cnPdC-000229-B6 for 26058 <at> debbugs.gnu.org; Mon, 13 Mar 2017 09:03:23 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 1834065B05; Mon, 13 Mar 2017 09:03:22 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=swnsfims/0D0 TEyt1vu/xC0viy0=; b=JG2KXBimohADXe3AvXitzy9WNfFLa5HVxVUzQUKI+mR5 9UEAjtGwF1iFBe7QNc4NQTGrF4O3fk1CV/np7GhfDqriT2BWoODEbhDy8JKJPtFK 8nPhC9aktd4kBxpyAPHc8WSPEWr7vXhsFw24Id5Ee3PtoPecio4R1Zpp4L/rd+U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=fDqjTz juaBgme+Pv3euvTgCZ8eNz2Mlk3wGlM7wzaVPzmLC1JUYnouI/blRA+iYYxaqaQr kzW4aVTUotKu/Nblqmx40IyFwNhP2lK4SdcOXtI7jXwpNf1MkUvWXyGaTU++qKCh kQowiLMT6eqFf0/Rb6McwoVAjCQ45S7BZ68Ig= Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 11B8765B04; Mon, 13 Mar 2017 09:03:22 -0400 (EDT) Received: from clucks (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 32B2165B03; Mon, 13 Mar 2017 09:03:21 -0400 (EDT) From: Andy Wingo <wingo@HIDDEN> References: <87o9x83t0f.fsf@HIDDEN> Date: Mon, 13 Mar 2017 14:03:14 +0100 In-Reply-To: <87o9x83t0f.fsf@HIDDEN> ("\"Taylan Ulrich \"=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer\?="\""'s message of "Sat, 11 Mar 2017 13:19:44 +0100") Message-ID: <87shmhqqgd.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Pobox-Relay-ID: 6F5A1AFC-07ED-11E7-8D5B-85AB91A0D1B0-02397024!pb-sasl2.pobox.com X-Spam-Score: -0.3 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.3 (/) On Sat 11 Mar 2017 13:19, taylanbayirli@HIDDEN ("Taylan Ulrich "Bay=C4= =B1rl=C4=B1/Kammer"") writes: > See the R6RS Libraries document page 10. The differences: > > - R6RS supports reading a BOM. > > - R6RS mandates an endianness argument to specify the behavior at the > absence of a BOM. > > - R6RS allows an optional third argument 'endianness-mandatory' to > explicitly ignore any possible BOM. > > Here's a quick patch on top of master. I didn't test it thoroughly... Hi, this is a tricky area that is not so amenable to quick patches :) Have you looked into what Guile already does for byte-order marks? Can you explain how the R6RS specification relates to this? https://www.gnu.org/software/guile/manual/html_node/BOM-Handling.html Andy
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: taylanbayirli@HIDDEN (Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=) Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 13 Mar 2017 18:04:01 +0000 Resent-Message-ID: <handler.26058.B26058.148942818417690 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Andy Wingo <wingo@HIDDEN> Cc: 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.148942818417690 (code B ref 26058); Mon, 13 Mar 2017 18:04:01 +0000 Received: (at 26058) by debbugs.gnu.org; 13 Mar 2017 18:03:04 +0000 Received: from localhost ([127.0.0.1]:54180 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cnUJE-0004bG-Ab for submit <at> debbugs.gnu.org; Mon, 13 Mar 2017 14:03:04 -0400 Received: from mail-wr0-f180.google.com ([209.85.128.180]:36442) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <taylanbayirli@HIDDEN>) id 1cnUJC-0004al-HE for 26058 <at> debbugs.gnu.org; Mon, 13 Mar 2017 14:03:03 -0400 Received: by mail-wr0-f180.google.com with SMTP id u108so108866328wrb.3 for <26058 <at> debbugs.gnu.org>; Mon, 13 Mar 2017 11:03:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=1iDW5oFOoNWsCRBbDC3ZGY3DebRF/iMclwKml0m6qdY=; b=suqRNMC9aE1Rdsr2cR+BNQ4IAdeFynhrsbmi+PMpr7C3R36aesjLLIJRUOZySE3AUx xfVCR54B8LEvDRAyUh8cJ17bti4vUg2J1y7DzsWypxhgwQTRkRtlqlpfpJF4asL5wSal jRXMAjo8vFBzZAv9IoHV8xqzScSDUfEcCOaphiDu/vmTHwa8rqztW5GV7QzOSzYHwSao no9f5Uhk5/skDEc0dSlaWwtn8UCBvpmq+Gprm97bFBUTShOIYv0OGW5n0TqDmw3fLhok 4keHgzwONvWs3g/gu+AwGo3DRJ+/i10s8v99IVBJaXd7qs81KYAe1R4pex9NQC9MdHVl dWyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=1iDW5oFOoNWsCRBbDC3ZGY3DebRF/iMclwKml0m6qdY=; b=QZQ/uMzAlZ6NlBwk4WQNsfQKa+V0Y95159UUx6VPZqHUl+Cc4gycgO1+cMjynGpv2/ dTGSY+HZGiTu1XtfLVEg8NAyOEedN84GAK6wqoEIuT7AUV6Nx4uRXJi4nl8b9NDYWN4t t4t17+8BXKohxYZJKEmYPVFvGN+3LMmRE6hGMcGX796H8XsjCcbEz3dR77suLuM+0HEz sFw0wnqjuhfL1jx/kkcAbLRwwnMXB7NEbmOCe4B7KYF2DZBVaZLEf2e0ewWQIwfxBVlX l02J/MIdqVo8Oj/CebQsDgszKJmUQAyNOg7qX5HpKDXWENLN7HVkQ2Ph+Du76cA/F4BG K+tQ== X-Gm-Message-State: AMke39l+JrS6FxCz0kTfJdNybBmrnXCC8wYnGqeDrb3dqG3t4DurNeXWnfDq8bxdUgIXRA== X-Received: by 10.223.182.167 with SMTP id j39mr27559082wre.152.1489428176620; Mon, 13 Mar 2017 11:02:56 -0700 (PDT) Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0]) by smtp.gmail.com with ESMTPSA id j26sm25940374wrb.69.2017.03.13.11.02.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 13 Mar 2017 11:02:55 -0700 (PDT) From: taylanbayirli@HIDDEN (Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=) References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN> Date: Mon, 13 Mar 2017 19:10:36 +0100 In-Reply-To: <87shmhqqgd.fsf@HIDDEN> (Andy Wingo's message of "Mon, 13 Mar 2017 14:03:14 +0100") Message-ID: <87h92xyrmr.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) Andy Wingo <wingo@HIDDEN> writes: > Hi, > > this is a tricky area that is not so amenable to quick patches :) Have > you looked into what Guile already does for byte-order marks? Can you > explain how the R6RS specification relates to this? > > https://www.gnu.org/software/guile/manual/html_node/BOM-Handling.html > > Andy Hmm, interesting. I noticed the utf{16,32}->string procedures ignoring a BOM at the start of the given bytevector, but didn't look at it from a ports perspective. TL;DR of the below though: the R6RS semantics offer a strict enrichment of the feature-set of the utfX->string procedures relative to the Guile semantics, so at most we would end up with spurious features. (The optional ability to handle any possible BOM at the start of the bytevector, with a fall-back endianness in case none is found.) That said, let's see... If I do a textual read from a port, I already get a string and not a bytevector, so the behavior of utfX->string operations is irrelevant. If I do binary I/O, the following situations are possible: 1. I'm guaranteed to get any possible bytes that happen to form a valid BOM at the start of the stream as-is in the returned bytevector; the binary I/O interface doesn't see such bytes as anything special, as it could simply be coincidence that the stream starts with such bytes. 2. I'm guaranteed *not* to get bytes that form a BOM at the start of the stream; instead they're consumed to set the port encoding for any future text I/O. 3. The behavior is unspecified and either of the above may happen. In the case of #1, it's probably good for utfX->string procedures to be able to handle BOMs, but also allow explicitly ignoring any possible BOM. The R6RS semantics cover this. In the case of #2, the utfX->string procedures don't need to be able to handle BOMs as far as we're talking about passing them bytevectors returned by port I/O, but it also doesn't hurt if they optionally support it. The R6RS semantics are fine here as well I think. As for #3... first of all it's bad IMO; the behavior ought to be specified. :-) But in any case, the additional features of the R6RS semantics won't hurt. WDYT? As far as I understand the page you linked, Guile currently implements #3, which I think is unfortunate but can kinda understand too. In any case, the additional R6RS features won't hurt, right? Taylan
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: Andy Wingo <wingo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 13 Mar 2017 21:25:01 +0000 Resent-Message-ID: <handler.26058.B26058.14894402943714 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: taylanbayirli@HIDDEN (Taylan Ulrich "=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Cc: 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.14894402943714 (code B ref 26058); Mon, 13 Mar 2017 21:25:01 +0000 Received: (at 26058) by debbugs.gnu.org; 13 Mar 2017 21:24:54 +0000 Received: from localhost ([127.0.0.1]:54380 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cnXSY-0000xq-Cj for submit <at> debbugs.gnu.org; Mon, 13 Mar 2017 17:24:54 -0400 Received: from pb-sasl2.pobox.com ([64.147.108.67]:62179 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1cnXSW-0000xi-G8 for 26058 <at> debbugs.gnu.org; Mon, 13 Mar 2017 17:24:52 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 2488D66FED; Mon, 13 Mar 2017 17:24:52 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=4YZiiq6cI++N /73B2iVMVj6TjTk=; b=tIR/DxZPDTXG9qkRJJ0oO1jlDiNeOXihoMMteIUqqOGn MqxZjBuSd01rhvp13kBJd2YQCGcyGbr9exWkZpYy+C3dkRO6md1tpLXCoGxpgAXN nFhyGDrymTUw6+PhRRdKTZZX3z1K6jMUZBvWu18G5t8zfApDHU0X7vGKRyXeCCg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=yK+B5k sRi2JnD38tF2cqMcJru2omNeayL36Dvy9lVITrimGn9v7wPckqHC/aCGUtp4lg9j +QC7TUbdD5JbF56DtISo5CaqLKiAHuG7VsBx2TRC+E1HyiSkTWdd9a9e5lU8VG5+ R1bBHSuRTcTQ0/ihrBGQMx5uhKDfHiGe3VJos= Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 0B16866FEC; Mon, 13 Mar 2017 17:24:52 -0400 (EDT) Received: from clucks (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl2.pobox.com (Postfix) with ESMTPSA id C4B3D66FEB; Mon, 13 Mar 2017 17:24:50 -0400 (EDT) From: Andy Wingo <wingo@HIDDEN> References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN> <87h92xyrmr.fsf@HIDDEN> Date: Mon, 13 Mar 2017 22:24:42 +0100 In-Reply-To: <87h92xyrmr.fsf@HIDDEN> ("Taylan Ulrich \"=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer\?=""'s message of "Mon, 13 Mar 2017 19:10:36 +0100") Message-ID: <87bmt4rht1.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Pobox-Relay-ID: 7E3D53AE-0833-11E7-8681-85AB91A0D1B0-02397024!pb-sasl2.pobox.com X-Spam-Score: -0.3 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.3 (/) On Mon 13 Mar 2017 19:10, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=B1= rl=C4=B1/Kammer") writes: > If I do binary I/O, the following situations are possible: > > 1. I'm guaranteed to get any possible bytes that happen to form a valid > BOM at the start of the stream as-is in the returned bytevector; the > binary I/O interface doesn't see such bytes as anything special, as > it could simply be coincidence that the stream starts with such > bytes. > > 2. I'm guaranteed *not* to get bytes that form a BOM at the start of the > stream; instead they're consumed to set the port encoding for any > future text I/O. > > 3. The behavior is unspecified and either of the above may happen. (1). But I thought this bug was about using a bytevector as a source and then doing textual I/O on it, no? Andy
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: taylanbayirli@HIDDEN (Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=) Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Tue, 14 Mar 2017 14:57:02 +0000 Resent-Message-ID: <handler.26058.B26058.14895033668220 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Andy Wingo <wingo@HIDDEN> Cc: 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.14895033668220 (code B ref 26058); Tue, 14 Mar 2017 14:57:02 +0000 Received: (at 26058) by debbugs.gnu.org; 14 Mar 2017 14:56:06 +0000 Received: from localhost ([127.0.0.1]:55801 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cnnrp-00028W-LP for submit <at> debbugs.gnu.org; Tue, 14 Mar 2017 10:56:05 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:36046) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <taylanbayirli@HIDDEN>) id 1cnnrn-00027t-O0 for 26058 <at> debbugs.gnu.org; Tue, 14 Mar 2017 10:56:04 -0400 Received: by mail-wr0-f195.google.com with SMTP id l37so24187448wrc.3 for <26058 <at> debbugs.gnu.org>; Tue, 14 Mar 2017 07:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=+qISebW9uadEnddQoWiLjRBN5b1E2AxJUbEL3gOEPtw=; b=B7NuZW+CGf9pUkb8izu5DPvBuM4o0VQB0NcCg6I8OpcS6rKhBQazv8IjKT30QxQ4ZH dCH/uzuNoKMSi7sJXRxkuI1UPUjlRa8DQweqfH4c1lo1a31ODZ5mU/X10xEHHnaADvWo pDomx1fshp7xZYR/dEnItoG3DoyAnAw3cvlIAqXBUXj2ImEX6zlCjWV2n4jikQsgk6MC IBiRye/oLoSZQVOKBmqznZHBjFIeG8iS0yY4fNbbSWfHep7Dkrdbe+IosU5VGKG93zMl ex9JeEtg7oiTTi1rSsUR4GLM/GdDMs5g7Q+ZisB8rB+4gotxagoKdskPux8bMoTZgEkr Ndng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=+qISebW9uadEnddQoWiLjRBN5b1E2AxJUbEL3gOEPtw=; b=OOYOl1z72zZGWgV3dLPaXIZPdhKqAqj/lEq0zbkBvkU77R/M+HXWZMDHGgJhp6o3Bq YEhhf0TKyd/xEfbv6JOESuOGfIZnM9LBl9FvZLx6UumvERllkfGUWlb35ygWricO6H1W 0NfOUIGR8cFLakUwd1R4XuM/tg5b7rWrj+tGmcHH5ZdgO3eMS7hF608epwiNNP8adqCB YXGADL7TZQ8EOa0y+PTrOdGAgqCDw2/Ujkd9xnVPGz50vOdxyjdBjwd6thk7Uyp/Fn9B duQo/g1L1yNWdforJwR6s1Wxda7WVOl+16CJjehq7GSJ70SbwomHK16V76lNh0YPlQqb n8yg== X-Gm-Message-State: AMke39m6X5gsTQskJqlbihOO9kH/BW872Oer+/h5I8Dx6zxIMwbx47KxFuYl8g1G8+xGZg== X-Received: by 10.223.132.163 with SMTP id 32mr33789837wrg.147.1489503358000; Tue, 14 Mar 2017 07:55:58 -0700 (PDT) Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0]) by smtp.gmail.com with ESMTPSA id 11sm29454400wrb.10.2017.03.14.07.55.57 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 14 Mar 2017 07:55:57 -0700 (PDT) From: taylanbayirli@HIDDEN (Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=) References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN> <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN> Date: Tue, 14 Mar 2017 16:03:00 +0100 In-Reply-To: <87bmt4rht1.fsf@HIDDEN> (Andy Wingo's message of "Mon, 13 Mar 2017 22:24:42 +0100") Message-ID: <87d1djzysb.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) Andy Wingo <wingo@HIDDEN> writes: > On Mon 13 Mar 2017 19:10, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4= =B1rl=C4=B1/Kammer") writes: > >> If I do binary I/O, the following situations are possible: >> >> 1. I'm guaranteed to get any possible bytes that happen to form a valid >> BOM at the start of the stream as-is in the returned bytevector; the >> binary I/O interface doesn't see such bytes as anything special, as >> it could simply be coincidence that the stream starts with such >> bytes. > > (1). But I thought this bug was about using a bytevector as a source > and then doing textual I/O on it, no? I have a feeling we're somehow talking past each other. :-) As far as I'm concerned, the bug is just that the procedures don't conform to the specification. It would of course be good if the behavior of these procedures was somehow in harmony with the behavior of I/O operations, but I don't see any issues arising from adopting the R6RS behavior of the procedures utf16->string and utf32->string. Do you? Taylan
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: Andy Wingo <wingo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Tue, 14 Mar 2017 15:45:02 +0000 Resent-Message-ID: <handler.26058.B26058.148950628914002 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: taylanbayirli@HIDDEN (Taylan Ulrich "=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Cc: 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.148950628914002 (code B ref 26058); Tue, 14 Mar 2017 15:45:02 +0000 Received: (at 26058) by debbugs.gnu.org; 14 Mar 2017 15:44:49 +0000 Received: from localhost ([127.0.0.1]:55824 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cnocz-0003dm-99 for submit <at> debbugs.gnu.org; Tue, 14 Mar 2017 11:44:49 -0400 Received: from pb-sasl2.pobox.com ([64.147.108.67]:61117 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1cnocx-0003de-0P for 26058 <at> debbugs.gnu.org; Tue, 14 Mar 2017 11:44:47 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 6B72769A41; Tue, 14 Mar 2017 11:44:45 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=2rl2yPjxaVFR LSkS18fZn9sf/0U=; b=eqheqfFsGYyklliDztEWqodon2tZIzMyPubgJe6IP+BD 1ldIZT058R5Jls1OX0zm1ObsUr6SyHtiVOPbIsIr4qXlyBRZtIS8mv7aEtJIz05l KK3gX0zD2MdTsl8+vjvoCoZ7DRaETNLQmskiP9OoaMDs/lNkmRq2qYrNm+mFLeI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=q1Oe5T 1yKkM00Rk+gzXmA71b0T4GEGX1zzi3wkBRWJgsmufNtjPq4rKixZE/mQwH/pEMoY v/wU0y09JDnb7+8roSgftFUPl4GNnePdd6nHu6ak0nkXBb7+FgprqOhYCk3sb7dH zbmgjLcFNYcZ+wXBX6Nf/TVCewqSC9nbZI12s= Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 6574569A40; Tue, 14 Mar 2017 11:44:45 -0400 (EDT) Received: from clucks (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 6124969A3F; Tue, 14 Mar 2017 11:44:44 -0400 (EDT) From: Andy Wingo <wingo@HIDDEN> References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN> <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN> <87d1djzysb.fsf@HIDDEN> Date: Tue, 14 Mar 2017 16:44:37 +0100 In-Reply-To: <87d1djzysb.fsf@HIDDEN> ("Taylan Ulrich \"=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer\?=""'s message of "Tue, 14 Mar 2017 16:03:00 +0100") Message-ID: <877f3r7ti2.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Pobox-Relay-ID: 25663666-08CD-11E7-AA15-85AB91A0D1B0-02397024!pb-sasl2.pobox.com X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) On Tue 14 Mar 2017 16:03, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=B1= rl=C4=B1/Kammer") writes: > Andy Wingo <wingo@HIDDEN> writes: > >> On Mon 13 Mar 2017 19:10, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4= =B1rl=C4=B1/Kammer") writes: >> >>> If I do binary I/O, the following situations are possible: >>> >>> 1. I'm guaranteed to get any possible bytes that happen to form a valid >>> BOM at the start of the stream as-is in the returned bytevector; the >>> binary I/O interface doesn't see such bytes as anything special, as >>> it could simply be coincidence that the stream starts with such >>> bytes. >> >> (1). But I thought this bug was about using a bytevector as a source >> and then doing textual I/O on it, no? > > I have a feeling we're somehow talking past each other. :-) As far as > I'm concerned, the bug is just that the procedures don't conform to the > specification. > > It would of course be good if the behavior of these procedures was > somehow in harmony with the behavior of I/O operations, but I don't see > any issues arising from adopting the R6RS behavior of the procedures > utf16->string and utf32->string. Do you? Adopting the behavior is more or less fine. If it can be done while relying on the existing behavior, that is better than something ad-hoc in a module. Andy
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: taylanbayirli@HIDDEN (Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=) Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Thu, 16 Mar 2017 19:26:02 +0000 Resent-Message-ID: <handler.26058.B26058.14896923455293 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Andy Wingo <wingo@HIDDEN> Cc: 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.14896923455293 (code B ref 26058); Thu, 16 Mar 2017 19:26:02 +0000 Received: (at 26058) by debbugs.gnu.org; 16 Mar 2017 19:25:45 +0000 Received: from localhost ([127.0.0.1]:59272 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cob1t-0001NI-AD for submit <at> debbugs.gnu.org; Thu, 16 Mar 2017 15:25:45 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:33183) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <taylanbayirli@HIDDEN>) id 1cob1r-0001N6-F5 for 26058 <at> debbugs.gnu.org; Thu, 16 Mar 2017 15:25:43 -0400 Received: by mail-wr0-f194.google.com with SMTP id g10so7110754wrg.0 for <26058 <at> debbugs.gnu.org>; Thu, 16 Mar 2017 12:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=DoQ8BBGGD6NUCfUN4BLzS6rp6ubghszxnVmSmC8aw2I=; b=gagenwsUt//B0VJQAWXxhZqwoP4ukMZ1cHS65HCTInNLsBTmlYwiqyUbWoJINEsuC2 c0ge9GglVVeUvaYHgBNNuMFA1gjyYQ89IHRAiOpZru3vWTk5RbVEueh45nSoLWH+wxMS eCNSbXRL0WpqIT2B+J9tRnmQ3pd97II5rK0K06RWqfzzFYtI+NNCGiQc4YgNUHPYYlBY x4dU+cub5sq9LQ9jz+eBg3XOlYQ+NNIRWmYYtH27+Vy3cssSlPYfZXt3t3K7Maaa/ZYo P+/5L4XvuERUyHRJvfAJg632mLmpR5YwuO1KKtkcnjvDFHswvxjWVg9WSsEA9R3bqOaS N7bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=DoQ8BBGGD6NUCfUN4BLzS6rp6ubghszxnVmSmC8aw2I=; b=Tx6obgWwzaqEyXqEf5o0EpiHevhe21eF9K8+5AnCNaUlVODv+IhFzIpVAVaTACgeYz 5e3mwZ1sQhN3Jhtm5jJY3eYbDURdrC3KNSS7rVDvjTw7SScbk88y2UnoQSLTz1qYJczi 9l6XObDbhOvGb7R5kQNV1PhwVbecpgpIIbESG86+wB9HRX4F/Y7T2ljep0wrcW3EEzek HLLKve5f8Cbetg+rHxzREfwKBZ+OjK5FfGsdaxRD/e/tWGTBJJGJE3HFNEh0unVtZtDP Me/1snW5LORoFrxuc/Iy/Ias/wtfE7B3ijzhjLQww6RCYYbgBmYq5CjsvbYU2WNbWL7p aJwA== X-Gm-Message-State: AFeK/H2ox/+lzSe2Zq/LZY/sAcx+tmtVLyQ8cGyxv1GkBRm8CIQC3tiXYwbSRmZf4pH6Og== X-Received: by 10.223.163.145 with SMTP id l17mr9500343wrb.103.1489692337722; Thu, 16 Mar 2017 12:25:37 -0700 (PDT) Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0]) by smtp.gmail.com with ESMTPSA id 36sm7253504wrk.57.2017.03.16.12.25.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 16 Mar 2017 12:25:36 -0700 (PDT) From: taylanbayirli@HIDDEN (Taylan Ulrich =?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=) References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN> <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN> <87d1djzysb.fsf@HIDDEN> <877f3r7ti2.fsf@HIDDEN> Date: Thu, 16 Mar 2017 20:34:14 +0100 In-Reply-To: <877f3r7ti2.fsf@HIDDEN> (Andy Wingo's message of "Tue, 14 Mar 2017 16:44:37 +0100") Message-ID: <87r31xdnih.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) Andy Wingo <wingo@HIDDEN> writes: > Adopting the behavior is more or less fine. If it can be done while > relying on the existing behavior, that is better than something ad-hoc > in a module. You mean somehow leveraging the existing BOM handling code of Guile (found in the ports code) would be preferable to reimplementing it like in this patch, correct? In that light, I had this attempt: (define r6rs-utf16->string (case-lambda ((bv default-endianness) (let* ((binary-port (open-bytevector-input-port bv)) (transcoder (make-transcoder (utf-16-codec))) (utf16-port (transcoded-port binary-port transcoder))) ;; XXX how to set default-endianness for a port? (get-string-all utf16-port))) ((bv endianness endianness-mandatory?) (if endianness-mandatory? (utf16->string bv endianness) (r6rs-utf16->string bv endianness))))) As commented in the first branch of the case-lambda, this does not yet make use of the 'default-endianness' parameter to tell the port transcoder (or whoever) what to do in case no BOM is found in the stream. From what I can tell, Guile is currently hardcoded to *transparently* default to big-endian in ports.c, port_clear_stream_start_for_bom_read. Is there a way to detect when Guile was unable to find a BOM? (In that case one could set the endianness explicitly to the desired default.) Or do you see another way to implement this? Thanks for the feedback! Taylan P.S.: Huge congrats on the big release. :-)
X-Loop: help-debbugs@HIDDEN Subject: bug#26058: utf16->string and utf32->string don't conform to R6RS Resent-From: Mark H Weaver <mhw@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Mon, 15 Oct 2018 04:59:02 +0000 Resent-Message-ID: <handler.26058.B26058.153957948931231 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26058 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: taylanbayirli@HIDDEN (Taylan Ulrich "=?UTF-8?Q?Bay=C4=B1rl=C4=B1/Kammer?=") Cc: Andy Wingo <wingo@HIDDEN>, 26058 <at> debbugs.gnu.org Received: via spool by 26058-submit <at> debbugs.gnu.org id=B26058.153957948931231 (code B ref 26058); Mon, 15 Oct 2018 04:59:02 +0000 Received: (at 26058) by debbugs.gnu.org; 15 Oct 2018 04:58:09 +0000 Received: from localhost ([127.0.0.1]:49714 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1gBuxF-00087f-Hf for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 00:58:09 -0400 Received: from world.peace.net ([64.112.178.59]:49132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <mhw@HIDDEN>) id 1gBuxD-00087N-MW for 26058 <at> debbugs.gnu.org; Mon, 15 Oct 2018 00:58:08 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <mhw@HIDDEN>) id 1gBux3-0000YW-S1; Mon, 15 Oct 2018 00:57:58 -0400 From: Mark H Weaver <mhw@HIDDEN> References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN> <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN> <87d1djzysb.fsf@HIDDEN> <877f3r7ti2.fsf@HIDDEN> <87r31xdnih.fsf@HIDDEN> Date: Mon, 15 Oct 2018 00:57:41 -0400 In-Reply-To: <87r31xdnih.fsf@HIDDEN> ("Taylan Ulrich \=\?utf-8\?Q\?\=5C\=22Ba\?\= \=\?utf-8\?Q\?y\=C4\=B1rl\=C4\=B1\=2FKammer\=5C\=22\=22's\?\= message of "Thu, 16 Mar 2017 20:34:14 +0100") Message-ID: <878t2zst6i.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Hi Taylan, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=B1rl=C4=B1/Kammer") writes: > Andy Wingo <wingo@HIDDEN> writes: > >> Adopting the behavior is more or less fine. If it can be done while >> relying on the existing behavior, that is better than something ad-hoc >> in a module. In general, I agree with Andy's sentiment that it would be better to avoid redundant BOM handling code, and moreover, I appreciate his reluctance to apply a fix without careful consideration of our existing BOM semantics. However, as Taylan discovered, Guile does not provide a mechanism to specify a default endianness of a UTF-16 or UTF-32 port in case a BOM is not found. I see no straightforward way to implement these R6RS interfaces using ports. We could certainly add such a mechanism if needed, but I see another problem with this approach: the expense of creating and later collecting a bytevector port object would be a very heavy burden to place on these otherwise fairly lightweight operations. Therefore, I would prefer to avoid that implementation strategy for these operations. Although BOM handling for ports is quite complex with many subtle points to consider, detecting a BOM at the beginning of a bytevector is so trivial that I personally have no objection to this tiny duplication of logic. Therefore, my preference would be to adopt code similar to that proposed by Taylan, although I believe it can, and should, be further simplified: > diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm > index 9744359f0..997a8c9cb 100644 > --- a/module/rnrs/bytevectors.scm > +++ b/module/rnrs/bytevectors.scm > @@ -69,7 +69,9 @@ > bytevector-ieee-double-native-set! >=20=20 > string->utf8 string->utf16 string->utf32 > - utf8->string utf16->string utf32->string)) > + utf8->string > + (r6rs-utf16->string . utf16->string) > + (r6rs-utf32->string . utf32->string))) >=20=20 >=20=20 > (load-extension (string-append "libguile-" (effective-version)) > @@ -80,4 +82,52 @@ > `(quote ,sym) > (error "unsupported endianness" sym))) >=20=20 > +(define (read-bom16 bv) > + (let ((c0 (bytevector-u8-ref bv 0)) > + (c1 (bytevector-u8-ref bv 1))) > + (cond > + ((and (=3D c0 #xFE) (=3D c1 #xFF)) > + 'big) > + ((and (=3D c0 #xFF) (=3D c1 #xFE)) > + 'little) > + (else > + #f)))) We should gracefully handle the case of an empty bytevector, returning an empty string without error in that case. Also, we should use a single 'bytevector-u16-ref' operation to check for the BOM. Pick an arbitrary endianness for the operation (big-endian?), and compare the resulting integer with both #xFEFF and #xFFFE. That way, the code will be simpler and more efficient. Note that our VM has dedicated instructions for these multi-byte bytevector accessors, and there will be fewer comparison operations as well. Similarly for the utf32 case. What do you think? > +(define r6rs-utf16->string > + (case-lambda > + ((bv default-endianness) > + (let ((bom-endianness (read-bom16 bv))) > + (if (not bom-endianness) > + (utf16->string bv default-endianness) > + (substring/shared (utf16->string bv bom-endianness) 1)))) Better to use plain 'substring' here, I think. The machinery of shared substrings is more expensive, and unnecessary in this case. Otherwise, it looks good to me. Would you like to propose a revised patch? Andy, what do you think? Mark
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.