GNU bug report logs - #26058
utf16->string and utf32->string don't conform to R6RS

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guile; Reported by: taylanbayirli@HIDDEN ("Taylan Ulrich Bayırlı/Kammer"); dated Sat, 11 Mar 2017 12:14:01 UTC; Maintainer for guile is bug-guile@HIDDEN.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 15 Oct 2018 04:58:09 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 15 00:58:09 2018
Received: from localhost ([127.0.0.1]:49714 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gBuxF-00087f-Hf
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 00:58:09 -0400
Received: from world.peace.net ([64.112.178.59]:49132)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1gBuxD-00087N-MW
 for 26058 <at> debbugs.gnu.org; Mon, 15 Oct 2018 00:58:08 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gBux3-0000YW-S1; Mon, 15 Oct 2018 00:57:58 -0400
From: Mark H Weaver <mhw@HIDDEN>
To: taylanbayirli@HIDDEN (Taylan Ulrich =?utf-8?Q?=22Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer=22?=)
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN>
 <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN>
 <87d1djzysb.fsf@HIDDEN> <877f3r7ti2.fsf@HIDDEN>
 <87r31xdnih.fsf@HIDDEN>
Date: Mon, 15 Oct 2018 00:57:41 -0400
In-Reply-To: <87r31xdnih.fsf@HIDDEN> ("Taylan Ulrich
 \=\?utf-8\?Q\?\=5C\=22Ba\?\=
 \=\?utf-8\?Q\?y\=C4\=B1rl\=C4\=B1\=2FKammer\=5C\=22\=22's\?\=
 message of "Thu, 16 Mar 2017 20:34:14 +0100")
Message-ID: <878t2zst6i.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 26058
Cc: Andy Wingo <wingo@HIDDEN>, 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Hi Taylan,

taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=B1rl=C4=B1/Kammer") writes:

> Andy Wingo <wingo@HIDDEN> writes:
>
>> Adopting the behavior is more or less fine.  If it can be done while
>> relying on the existing behavior, that is better than something ad-hoc
>> in a module.

In general, I agree with Andy's sentiment that it would be better to
avoid redundant BOM handling code, and moreover, I appreciate his
reluctance to apply a fix without careful consideration of our existing
BOM semantics.

However, as Taylan discovered, Guile does not provide a mechanism to
specify a default endianness of a UTF-16 or UTF-32 port in case a BOM is
not found.  I see no straightforward way to implement these R6RS
interfaces using ports.

We could certainly add such a mechanism if needed, but I see another
problem with this approach: the expense of creating and later collecting
a bytevector port object would be a very heavy burden to place on these
otherwise fairly lightweight operations.  Therefore, I would prefer to
avoid that implementation strategy for these operations.

Although BOM handling for ports is quite complex with many subtle points
to consider, detecting a BOM at the beginning of a bytevector is so
trivial that I personally have no objection to this tiny duplication of
logic.

Therefore, my preference would be to adopt code similar to that proposed
by Taylan, although I believe it can, and should, be further simplified:

> diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm
> index 9744359f0..997a8c9cb 100644
> --- a/module/rnrs/bytevectors.scm
> +++ b/module/rnrs/bytevectors.scm
> @@ -69,7 +69,9 @@
>             bytevector-ieee-double-native-set!
>=20=20
>             string->utf8 string->utf16 string->utf32
> -           utf8->string utf16->string utf32->string))
> +           utf8->string
> +           (r6rs-utf16->string . utf16->string)
> +           (r6rs-utf32->string . utf32->string)))
>=20=20
>=20=20
>  (load-extension (string-append "libguile-" (effective-version))
> @@ -80,4 +82,52 @@
>        `(quote ,sym)
>        (error "unsupported endianness" sym)))
>=20=20
> +(define (read-bom16 bv)
> +  (let ((c0 (bytevector-u8-ref bv 0))
> +        (c1 (bytevector-u8-ref bv 1)))
> +    (cond
> +     ((and (=3D c0 #xFE) (=3D c1 #xFF))
> +      'big)
> +     ((and (=3D c0 #xFF) (=3D c1 #xFE))
> +      'little)
> +     (else
> +      #f))))

We should gracefully handle the case of an empty bytevector, returning
an empty string without error in that case.

Also, we should use a single 'bytevector-u16-ref' operation to check for
the BOM.  Pick an arbitrary endianness for the operation (big-endian?),
and compare the resulting integer with both #xFEFF and #xFFFE.  That
way, the code will be simpler and more efficient.  Note that our VM has
dedicated instructions for these multi-byte bytevector accessors, and
there will be fewer comparison operations as well.  Similarly for the
utf32 case.

What do you think?

> +(define r6rs-utf16->string
> +  (case-lambda
> +    ((bv default-endianness)
> +     (let ((bom-endianness (read-bom16 bv)))
> +       (if (not bom-endianness)
> +           (utf16->string bv default-endianness)
> +           (substring/shared (utf16->string bv bom-endianness) 1))))

Better to use plain 'substring' here, I think.  The machinery of shared
substrings is more expensive, and unnecessary in this case.

Otherwise, it looks good to me.  Would you like to propose a revised
patch?

Andy, what do you think?

       Mark




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 16 Mar 2017 19:25:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Mar 16 15:25:45 2017
Received: from localhost ([127.0.0.1]:59272 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cob1t-0001NI-AD
	for submit <at> debbugs.gnu.org; Thu, 16 Mar 2017 15:25:45 -0400
Received: from mail-wr0-f194.google.com ([209.85.128.194]:33183)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cob1r-0001N6-F5
 for 26058 <at> debbugs.gnu.org; Thu, 16 Mar 2017 15:25:43 -0400
Received: by mail-wr0-f194.google.com with SMTP id g10so7110754wrg.0
 for <26058 <at> debbugs.gnu.org>; Thu, 16 Mar 2017 12:25:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:cc:subject:references:date:in-reply-to:message-id
 :user-agent:mime-version;
 bh=DoQ8BBGGD6NUCfUN4BLzS6rp6ubghszxnVmSmC8aw2I=;
 b=gagenwsUt//B0VJQAWXxhZqwoP4ukMZ1cHS65HCTInNLsBTmlYwiqyUbWoJINEsuC2
 c0ge9GglVVeUvaYHgBNNuMFA1gjyYQ89IHRAiOpZru3vWTk5RbVEueh45nSoLWH+wxMS
 eCNSbXRL0WpqIT2B+J9tRnmQ3pd97II5rK0K06RWqfzzFYtI+NNCGiQc4YgNUHPYYlBY
 x4dU+cub5sq9LQ9jz+eBg3XOlYQ+NNIRWmYYtH27+Vy3cssSlPYfZXt3t3K7Maaa/ZYo
 P+/5L4XvuERUyHRJvfAJg632mLmpR5YwuO1KKtkcnjvDFHswvxjWVg9WSsEA9R3bqOaS
 N7bA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to
 :message-id:user-agent:mime-version;
 bh=DoQ8BBGGD6NUCfUN4BLzS6rp6ubghszxnVmSmC8aw2I=;
 b=Tx6obgWwzaqEyXqEf5o0EpiHevhe21eF9K8+5AnCNaUlVODv+IhFzIpVAVaTACgeYz
 5e3mwZ1sQhN3Jhtm5jJY3eYbDURdrC3KNSS7rVDvjTw7SScbk88y2UnoQSLTz1qYJczi
 9l6XObDbhOvGb7R5kQNV1PhwVbecpgpIIbESG86+wB9HRX4F/Y7T2ljep0wrcW3EEzek
 HLLKve5f8Cbetg+rHxzREfwKBZ+OjK5FfGsdaxRD/e/tWGTBJJGJE3HFNEh0unVtZtDP
 Me/1snW5LORoFrxuc/Iy/Ias/wtfE7B3ijzhjLQww6RCYYbgBmYq5CjsvbYU2WNbWL7p
 aJwA==
X-Gm-Message-State: AFeK/H2ox/+lzSe2Zq/LZY/sAcx+tmtVLyQ8cGyxv1GkBRm8CIQC3tiXYwbSRmZf4pH6Og==
X-Received: by 10.223.163.145 with SMTP id l17mr9500343wrb.103.1489692337722; 
 Thu, 16 Mar 2017 12:25:37 -0700 (PDT)
Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0])
 by smtp.gmail.com with ESMTPSA id 36sm7253504wrk.57.2017.03.16.12.25.36
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Thu, 16 Mar 2017 12:25:36 -0700 (PDT)
From: taylanbayirli@HIDDEN (Taylan Ulrich =?utf-8?Q?Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer?=)
To: Andy Wingo <wingo@HIDDEN>
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN>
 <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN>
 <87d1djzysb.fsf@HIDDEN> <877f3r7ti2.fsf@HIDDEN>
Date: Thu, 16 Mar 2017 20:34:14 +0100
In-Reply-To: <877f3r7ti2.fsf@HIDDEN> (Andy Wingo's message of "Tue, 14 Mar
 2017 16:44:37 +0100")
Message-ID: <87r31xdnih.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 26058
Cc: 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

Andy Wingo <wingo@HIDDEN> writes:

> Adopting the behavior is more or less fine.  If it can be done while
> relying on the existing behavior, that is better than something ad-hoc
> in a module.

You mean somehow leveraging the existing BOM handling code of Guile
(found in the ports code) would be preferable to reimplementing it like
in this patch, correct?

In that light, I had this attempt:

(define r6rs-utf16->string
  (case-lambda
    ((bv default-endianness)
     (let* ((binary-port (open-bytevector-input-port bv))
            (transcoder (make-transcoder (utf-16-codec)))
            (utf16-port (transcoded-port binary-port transcoder)))
       ;; XXX how to set default-endianness for a port?
       (get-string-all utf16-port)))
    ((bv endianness endianness-mandatory?)
     (if endianness-mandatory?
         (utf16->string bv endianness)
         (r6rs-utf16->string bv endianness)))))

As commented in the first branch of the case-lambda, this does not yet
make use of the 'default-endianness' parameter to tell the port
transcoder (or whoever) what to do in case no BOM is found in the
stream.

From what I can tell, Guile is currently hardcoded to *transparently*
default to big-endian in ports.c, port_clear_stream_start_for_bom_read.

Is there a way to detect when Guile was unable to find a BOM?  (In that
case one could set the endianness explicitly to the desired default.)

Or do you see another way to implement this?

Thanks for the feedback!
Taylan


P.S.: Huge congrats on the big release. :-)




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 14 Mar 2017 15:44:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 14 11:44:49 2017
Received: from localhost ([127.0.0.1]:55824 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cnocz-0003dm-99
	for submit <at> debbugs.gnu.org; Tue, 14 Mar 2017 11:44:49 -0400
Received: from pb-sasl2.pobox.com ([64.147.108.67]:61117
 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <wingo@HIDDEN>) id 1cnocx-0003de-0P
 for 26058 <at> debbugs.gnu.org; Tue, 14 Mar 2017 11:44:47 -0400
Received: from sasl.smtp.pobox.com (unknown [127.0.0.1])
 by pb-sasl2.pobox.com (Postfix) with ESMTP id 6B72769A41;
 Tue, 14 Mar 2017 11:44:45 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc
 :subject:references:date:in-reply-to:message-id:mime-version
 :content-type:content-transfer-encoding; s=sasl; bh=2rl2yPjxaVFR
 LSkS18fZn9sf/0U=; b=eqheqfFsGYyklliDztEWqodon2tZIzMyPubgJe6IP+BD
 1ldIZT058R5Jls1OX0zm1ObsUr6SyHtiVOPbIsIr4qXlyBRZtIS8mv7aEtJIz05l
 KK3gX0zD2MdTsl8+vjvoCoZ7DRaETNLQmskiP9OoaMDs/lNkmRq2qYrNm+mFLeI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc
 :subject:references:date:in-reply-to:message-id:mime-version
 :content-type:content-transfer-encoding; q=dns; s=sasl; b=q1Oe5T
 1yKkM00Rk+gzXmA71b0T4GEGX1zzi3wkBRWJgsmufNtjPq4rKixZE/mQwH/pEMoY
 v/wU0y09JDnb7+8roSgftFUPl4GNnePdd6nHu6ak0nkXBb7+FgprqOhYCk3sb7dH
 zbmgjLcFNYcZ+wXBX6Nf/TVCewqSC9nbZI12s=
Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1])
 by pb-sasl2.pobox.com (Postfix) with ESMTP id 6574569A40;
 Tue, 14 Mar 2017 11:44:45 -0400 (EDT)
Received: from clucks (unknown [88.160.190.192])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 6124969A3F;
 Tue, 14 Mar 2017 11:44:44 -0400 (EDT)
From: Andy Wingo <wingo@HIDDEN>
To: taylanbayirli@HIDDEN (Taylan Ulrich =?utf-8?Q?=22Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer=22?=)
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN>
 <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN>
 <87d1djzysb.fsf@HIDDEN>
Date: Tue, 14 Mar 2017 16:44:37 +0100
In-Reply-To: <87d1djzysb.fsf@HIDDEN> ("Taylan Ulrich =?utf-8?Q?=5C=22Ba?=
 =?utf-8?Q?y=C4=B1rl=C4=B1=2FKammer=5C=22=22's?=
 message of "Tue, 14 Mar 2017 16:03:00 +0100")
Message-ID: <877f3r7ti2.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Pobox-Relay-ID: 25663666-08CD-11E7-AA15-85AB91A0D1B0-02397024!pb-sasl2.pobox.com
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 26058
Cc: 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

On Tue 14 Mar 2017 16:03, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=B1=
rl=C4=B1/Kammer") writes:

> Andy Wingo <wingo@HIDDEN> writes:
>
>> On Mon 13 Mar 2017 19:10, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=
=B1rl=C4=B1/Kammer") writes:
>>
>>> If I do binary I/O, the following situations are possible:
>>>
>>> 1. I'm guaranteed to get any possible bytes that happen to form a valid
>>>    BOM at the start of the stream as-is in the returned bytevector; the
>>>    binary I/O interface doesn't see such bytes as anything special, as
>>>    it could simply be coincidence that the stream starts with such
>>>    bytes.
>>
>> (1).  But I thought this bug was about using a bytevector as a source
>> and then doing textual I/O on it, no?
>
> I have a feeling we're somehow talking past each other. :-) As far as
> I'm concerned, the bug is just that the procedures don't conform to the
> specification.
>
> It would of course be good if the behavior of these procedures was
> somehow in harmony with the behavior of I/O operations, but I don't see
> any issues arising from adopting the R6RS behavior of the procedures
> utf16->string and utf32->string.  Do you?

Adopting the behavior is more or less fine.  If it can be done while
relying on the existing behavior, that is better than something ad-hoc
in a module.

Andy




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 14 Mar 2017 14:56:06 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 14 10:56:05 2017
Received: from localhost ([127.0.0.1]:55801 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cnnrp-00028W-LP
	for submit <at> debbugs.gnu.org; Tue, 14 Mar 2017 10:56:05 -0400
Received: from mail-wr0-f195.google.com ([209.85.128.195]:36046)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cnnrn-00027t-O0
 for 26058 <at> debbugs.gnu.org; Tue, 14 Mar 2017 10:56:04 -0400
Received: by mail-wr0-f195.google.com with SMTP id l37so24187448wrc.3
 for <26058 <at> debbugs.gnu.org>; Tue, 14 Mar 2017 07:56:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:cc:subject:references:date:in-reply-to:message-id
 :user-agent:mime-version:content-transfer-encoding;
 bh=+qISebW9uadEnddQoWiLjRBN5b1E2AxJUbEL3gOEPtw=;
 b=B7NuZW+CGf9pUkb8izu5DPvBuM4o0VQB0NcCg6I8OpcS6rKhBQazv8IjKT30QxQ4ZH
 dCH/uzuNoKMSi7sJXRxkuI1UPUjlRa8DQweqfH4c1lo1a31ODZ5mU/X10xEHHnaADvWo
 pDomx1fshp7xZYR/dEnItoG3DoyAnAw3cvlIAqXBUXj2ImEX6zlCjWV2n4jikQsgk6MC
 IBiRye/oLoSZQVOKBmqznZHBjFIeG8iS0yY4fNbbSWfHep7Dkrdbe+IosU5VGKG93zMl
 ex9JeEtg7oiTTi1rSsUR4GLM/GdDMs5g7Q+ZisB8rB+4gotxagoKdskPux8bMoTZgEkr
 Ndng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to
 :message-id:user-agent:mime-version:content-transfer-encoding;
 bh=+qISebW9uadEnddQoWiLjRBN5b1E2AxJUbEL3gOEPtw=;
 b=OOYOl1z72zZGWgV3dLPaXIZPdhKqAqj/lEq0zbkBvkU77R/M+HXWZMDHGgJhp6o3Bq
 YEhhf0TKyd/xEfbv6JOESuOGfIZnM9LBl9FvZLx6UumvERllkfGUWlb35ygWricO6H1W
 0NfOUIGR8cFLakUwd1R4XuM/tg5b7rWrj+tGmcHH5ZdgO3eMS7hF608epwiNNP8adqCB
 YXGADL7TZQ8EOa0y+PTrOdGAgqCDw2/Ujkd9xnVPGz50vOdxyjdBjwd6thk7Uyp/Fn9B
 duQo/g1L1yNWdforJwR6s1Wxda7WVOl+16CJjehq7GSJ70SbwomHK16V76lNh0YPlQqb
 n8yg==
X-Gm-Message-State: AMke39m6X5gsTQskJqlbihOO9kH/BW872Oer+/h5I8Dx6zxIMwbx47KxFuYl8g1G8+xGZg==
X-Received: by 10.223.132.163 with SMTP id 32mr33789837wrg.147.1489503358000; 
 Tue, 14 Mar 2017 07:55:58 -0700 (PDT)
Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0])
 by smtp.gmail.com with ESMTPSA id 11sm29454400wrb.10.2017.03.14.07.55.57
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Tue, 14 Mar 2017 07:55:57 -0700 (PDT)
From: taylanbayirli@HIDDEN (Taylan Ulrich =?utf-8?Q?Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer?=)
To: Andy Wingo <wingo@HIDDEN>
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN>
 <87h92xyrmr.fsf@HIDDEN> <87bmt4rht1.fsf@HIDDEN>
Date: Tue, 14 Mar 2017 16:03:00 +0100
In-Reply-To: <87bmt4rht1.fsf@HIDDEN> (Andy Wingo's message of "Mon, 13 Mar
 2017 22:24:42 +0100")
Message-ID: <87d1djzysb.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 26058
Cc: 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

Andy Wingo <wingo@HIDDEN> writes:

> On Mon 13 Mar 2017 19:10, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=
=B1rl=C4=B1/Kammer") writes:
>
>> If I do binary I/O, the following situations are possible:
>>
>> 1. I'm guaranteed to get any possible bytes that happen to form a valid
>>    BOM at the start of the stream as-is in the returned bytevector; the
>>    binary I/O interface doesn't see such bytes as anything special, as
>>    it could simply be coincidence that the stream starts with such
>>    bytes.
>
> (1).  But I thought this bug was about using a bytevector as a source
> and then doing textual I/O on it, no?

I have a feeling we're somehow talking past each other. :-) As far as
I'm concerned, the bug is just that the procedures don't conform to the
specification.

It would of course be good if the behavior of these procedures was
somehow in harmony with the behavior of I/O operations, but I don't see
any issues arising from adopting the R6RS behavior of the procedures
utf16->string and utf32->string.  Do you?

Taylan




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 13 Mar 2017 21:24:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 13 17:24:54 2017
Received: from localhost ([127.0.0.1]:54380 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cnXSY-0000xq-Cj
	for submit <at> debbugs.gnu.org; Mon, 13 Mar 2017 17:24:54 -0400
Received: from pb-sasl2.pobox.com ([64.147.108.67]:62179
 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <wingo@HIDDEN>) id 1cnXSW-0000xi-G8
 for 26058 <at> debbugs.gnu.org; Mon, 13 Mar 2017 17:24:52 -0400
Received: from sasl.smtp.pobox.com (unknown [127.0.0.1])
 by pb-sasl2.pobox.com (Postfix) with ESMTP id 2488D66FED;
 Mon, 13 Mar 2017 17:24:52 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc
 :subject:references:date:in-reply-to:message-id:mime-version
 :content-type:content-transfer-encoding; s=sasl; bh=4YZiiq6cI++N
 /73B2iVMVj6TjTk=; b=tIR/DxZPDTXG9qkRJJ0oO1jlDiNeOXihoMMteIUqqOGn
 MqxZjBuSd01rhvp13kBJd2YQCGcyGbr9exWkZpYy+C3dkRO6md1tpLXCoGxpgAXN
 nFhyGDrymTUw6+PhRRdKTZZX3z1K6jMUZBvWu18G5t8zfApDHU0X7vGKRyXeCCg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc
 :subject:references:date:in-reply-to:message-id:mime-version
 :content-type:content-transfer-encoding; q=dns; s=sasl; b=yK+B5k
 sRi2JnD38tF2cqMcJru2omNeayL36Dvy9lVITrimGn9v7wPckqHC/aCGUtp4lg9j
 +QC7TUbdD5JbF56DtISo5CaqLKiAHuG7VsBx2TRC+E1HyiSkTWdd9a9e5lU8VG5+
 R1bBHSuRTcTQ0/ihrBGQMx5uhKDfHiGe3VJos=
Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1])
 by pb-sasl2.pobox.com (Postfix) with ESMTP id 0B16866FEC;
 Mon, 13 Mar 2017 17:24:52 -0400 (EDT)
Received: from clucks (unknown [88.160.190.192])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pb-sasl2.pobox.com (Postfix) with ESMTPSA id C4B3D66FEB;
 Mon, 13 Mar 2017 17:24:50 -0400 (EDT)
From: Andy Wingo <wingo@HIDDEN>
To: taylanbayirli@HIDDEN (Taylan Ulrich =?utf-8?Q?=22Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer=22?=)
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN>
 <87h92xyrmr.fsf@HIDDEN>
Date: Mon, 13 Mar 2017 22:24:42 +0100
In-Reply-To: <87h92xyrmr.fsf@HIDDEN> ("Taylan Ulrich =?utf-8?Q?=5C=22Ba?=
 =?utf-8?Q?y=C4=B1rl=C4=B1=2FKammer=5C=22=22's?=
 message of "Mon, 13 Mar 2017 19:10:36 +0100")
Message-ID: <87bmt4rht1.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Pobox-Relay-ID: 7E3D53AE-0833-11E7-8681-85AB91A0D1B0-02397024!pb-sasl2.pobox.com
X-Spam-Score: -0.3 (/)
X-Debbugs-Envelope-To: 26058
Cc: 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.3 (/)

On Mon 13 Mar 2017 19:10, taylanbayirli@HIDDEN (Taylan Ulrich "Bay=C4=B1=
rl=C4=B1/Kammer") writes:

> If I do binary I/O, the following situations are possible:
>
> 1. I'm guaranteed to get any possible bytes that happen to form a valid
>    BOM at the start of the stream as-is in the returned bytevector; the
>    binary I/O interface doesn't see such bytes as anything special, as
>    it could simply be coincidence that the stream starts with such
>    bytes.
>
> 2. I'm guaranteed *not* to get bytes that form a BOM at the start of the
>    stream; instead they're consumed to set the port encoding for any
>    future text I/O.
>
> 3. The behavior is unspecified and either of the above may happen.

(1).  But I thought this bug was about using a bytevector as a source
and then doing textual I/O on it, no?

Andy




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 13 Mar 2017 18:03:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 13 14:03:04 2017
Received: from localhost ([127.0.0.1]:54180 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cnUJE-0004bG-Ab
	for submit <at> debbugs.gnu.org; Mon, 13 Mar 2017 14:03:04 -0400
Received: from mail-wr0-f180.google.com ([209.85.128.180]:36442)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cnUJC-0004al-HE
 for 26058 <at> debbugs.gnu.org; Mon, 13 Mar 2017 14:03:03 -0400
Received: by mail-wr0-f180.google.com with SMTP id u108so108866328wrb.3
 for <26058 <at> debbugs.gnu.org>; Mon, 13 Mar 2017 11:03:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:cc:subject:references:date:in-reply-to:message-id
 :user-agent:mime-version;
 bh=1iDW5oFOoNWsCRBbDC3ZGY3DebRF/iMclwKml0m6qdY=;
 b=suqRNMC9aE1Rdsr2cR+BNQ4IAdeFynhrsbmi+PMpr7C3R36aesjLLIJRUOZySE3AUx
 xfVCR54B8LEvDRAyUh8cJ17bti4vUg2J1y7DzsWypxhgwQTRkRtlqlpfpJF4asL5wSal
 jRXMAjo8vFBzZAv9IoHV8xqzScSDUfEcCOaphiDu/vmTHwa8rqztW5GV7QzOSzYHwSao
 no9f5Uhk5/skDEc0dSlaWwtn8UCBvpmq+Gprm97bFBUTShOIYv0OGW5n0TqDmw3fLhok
 4keHgzwONvWs3g/gu+AwGo3DRJ+/i10s8v99IVBJaXd7qs81KYAe1R4pex9NQC9MdHVl
 dWyA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to
 :message-id:user-agent:mime-version;
 bh=1iDW5oFOoNWsCRBbDC3ZGY3DebRF/iMclwKml0m6qdY=;
 b=QZQ/uMzAlZ6NlBwk4WQNsfQKa+V0Y95159UUx6VPZqHUl+Cc4gycgO1+cMjynGpv2/
 dTGSY+HZGiTu1XtfLVEg8NAyOEedN84GAK6wqoEIuT7AUV6Nx4uRXJi4nl8b9NDYWN4t
 t4t17+8BXKohxYZJKEmYPVFvGN+3LMmRE6hGMcGX796H8XsjCcbEz3dR77suLuM+0HEz
 sFw0wnqjuhfL1jx/kkcAbLRwwnMXB7NEbmOCe4B7KYF2DZBVaZLEf2e0ewWQIwfxBVlX
 l02J/MIdqVo8Oj/CebQsDgszKJmUQAyNOg7qX5HpKDXWENLN7HVkQ2Ph+Du76cA/F4BG
 K+tQ==
X-Gm-Message-State: AMke39l+JrS6FxCz0kTfJdNybBmrnXCC8wYnGqeDrb3dqG3t4DurNeXWnfDq8bxdUgIXRA==
X-Received: by 10.223.182.167 with SMTP id j39mr27559082wre.152.1489428176620; 
 Mon, 13 Mar 2017 11:02:56 -0700 (PDT)
Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0])
 by smtp.gmail.com with ESMTPSA id j26sm25940374wrb.69.2017.03.13.11.02.55
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Mon, 13 Mar 2017 11:02:55 -0700 (PDT)
From: taylanbayirli@HIDDEN (Taylan Ulrich =?utf-8?Q?Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer?=)
To: Andy Wingo <wingo@HIDDEN>
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN> <87shmhqqgd.fsf@HIDDEN>
Date: Mon, 13 Mar 2017 19:10:36 +0100
In-Reply-To: <87shmhqqgd.fsf@HIDDEN> (Andy Wingo's message of "Mon, 13 Mar
 2017 14:03:14 +0100")
Message-ID: <87h92xyrmr.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 26058
Cc: 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Andy Wingo <wingo@HIDDEN> writes:

> Hi,
>
> this is a tricky area that is not so amenable to quick patches :) Have
> you looked into what Guile already does for byte-order marks?  Can you
> explain how the R6RS specification relates to this?
>
>   https://www.gnu.org/software/guile/manual/html_node/BOM-Handling.html
>
> Andy

Hmm, interesting.  I noticed the utf{16,32}->string procedures ignoring
a BOM at the start of the given bytevector, but didn't look at it from a
ports perspective.

TL;DR of the below though: the R6RS semantics offer a strict enrichment
of the feature-set of the utfX->string procedures relative to the Guile
semantics, so at most we would end up with spurious features.  (The
optional ability to handle any possible BOM at the start of the
bytevector, with a fall-back endianness in case none is found.)


That said, let's see...

If I do a textual read from a port, I already get a string and not a
bytevector, so the behavior of utfX->string operations is irrelevant.

If I do binary I/O, the following situations are possible:

1. I'm guaranteed to get any possible bytes that happen to form a valid
   BOM at the start of the stream as-is in the returned bytevector; the
   binary I/O interface doesn't see such bytes as anything special, as
   it could simply be coincidence that the stream starts with such
   bytes.

2. I'm guaranteed *not* to get bytes that form a BOM at the start of the
   stream; instead they're consumed to set the port encoding for any
   future text I/O.

3. The behavior is unspecified and either of the above may happen.

In the case of #1, it's probably good for utfX->string procedures to be
able to handle BOMs, but also allow explicitly ignoring any possible
BOM.  The R6RS semantics cover this.

In the case of #2, the utfX->string procedures don't need to be able to
handle BOMs as far as we're talking about passing them bytevectors
returned by port I/O, but it also doesn't hurt if they optionally
support it.  The R6RS semantics are fine here as well I think.

As for #3... first of all it's bad IMO; the behavior ought to be
specified. :-) But in any case, the additional features of the R6RS
semantics won't hurt.

WDYT?  As far as I understand the page you linked, Guile currently
implements #3, which I think is unfortunate but can kinda understand
too.  In any case, the additional R6RS features won't hurt, right?

Taylan




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at 26058 <at> debbugs.gnu.org:


Received: (at 26058) by debbugs.gnu.org; 13 Mar 2017 13:03:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 13 09:03:25 2017
Received: from localhost ([127.0.0.1]:53312 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cnPdF-00022J-LO
	for submit <at> debbugs.gnu.org; Mon, 13 Mar 2017 09:03:25 -0400
Received: from pb-sasl2.pobox.com ([64.147.108.67]:50494
 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <wingo@HIDDEN>) id 1cnPdC-000229-B6
 for 26058 <at> debbugs.gnu.org; Mon, 13 Mar 2017 09:03:23 -0400
Received: from sasl.smtp.pobox.com (unknown [127.0.0.1])
 by pb-sasl2.pobox.com (Postfix) with ESMTP id 1834065B05;
 Mon, 13 Mar 2017 09:03:22 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc
 :subject:references:date:in-reply-to:message-id:mime-version
 :content-type:content-transfer-encoding; s=sasl; bh=swnsfims/0D0
 TEyt1vu/xC0viy0=; b=JG2KXBimohADXe3AvXitzy9WNfFLa5HVxVUzQUKI+mR5
 9UEAjtGwF1iFBe7QNc4NQTGrF4O3fk1CV/np7GhfDqriT2BWoODEbhDy8JKJPtFK
 8nPhC9aktd4kBxpyAPHc8WSPEWr7vXhsFw24Id5Ee3PtoPecio4R1Zpp4L/rd+U=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc
 :subject:references:date:in-reply-to:message-id:mime-version
 :content-type:content-transfer-encoding; q=dns; s=sasl; b=fDqjTz
 juaBgme+Pv3euvTgCZ8eNz2Mlk3wGlM7wzaVPzmLC1JUYnouI/blRA+iYYxaqaQr
 kzW4aVTUotKu/Nblqmx40IyFwNhP2lK4SdcOXtI7jXwpNf1MkUvWXyGaTU++qKCh
 kQowiLMT6eqFf0/Rb6McwoVAjCQ45S7BZ68Ig=
Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1])
 by pb-sasl2.pobox.com (Postfix) with ESMTP id 11B8765B04;
 Mon, 13 Mar 2017 09:03:22 -0400 (EDT)
Received: from clucks (unknown [88.160.190.192])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 32B2165B03;
 Mon, 13 Mar 2017 09:03:21 -0400 (EDT)
From: Andy Wingo <wingo@HIDDEN>
To: taylanbayirli@HIDDEN ("Taylan Ulrich =?utf-8?Q?=22Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer=22=22?=)
Subject: Re: bug#26058: utf16->string and utf32->string don't conform to R6RS
References: <87o9x83t0f.fsf@HIDDEN>
Date: Mon, 13 Mar 2017 14:03:14 +0100
In-Reply-To: <87o9x83t0f.fsf@HIDDEN> ("\"Taylan Ulrich
 =?utf-8?Q?=5C=22Bay=C4=B1rl=C4=B1=2FKammer=5C=22=5C=22=22's?= message of
 "Sat, 11 Mar 2017 13:19:44 +0100")
Message-ID: <87shmhqqgd.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Pobox-Relay-ID: 6F5A1AFC-07ED-11E7-8D5B-85AB91A0D1B0-02397024!pb-sasl2.pobox.com
X-Spam-Score: -0.3 (/)
X-Debbugs-Envelope-To: 26058
Cc: 26058 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.3 (/)

On Sat 11 Mar 2017 13:19, taylanbayirli@HIDDEN ("Taylan Ulrich "Bay=C4=
=B1rl=C4=B1/Kammer"") writes:

> See the R6RS Libraries document page 10.  The differences:
>
> - R6RS supports reading a BOM.
>
> - R6RS mandates an endianness argument to specify the behavior at the
>   absence of a BOM.
>
> - R6RS allows an optional third argument 'endianness-mandatory' to
>   explicitly ignore any possible BOM.
>
> Here's a quick patch on top of master.  I didn't test it thoroughly...

Hi,

this is a tricky area that is not so amenable to quick patches :) Have
you looked into what Guile already does for byte-order marks?  Can you
explain how the R6RS specification relates to this?

  https://www.gnu.org/software/guile/manual/html_node/BOM-Handling.html

Andy




Information forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 11 Mar 2017 12:13:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Mar 11 07:13:29 2017
Received: from localhost ([127.0.0.1]:50358 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cmftp-0007qj-9U
	for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:29 -0500
Received: from eggs.gnu.org ([208.118.235.92]:55512)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cmfto-0007qX-1m
 for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:28 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cmfth-0007kc-Ky
 for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:22 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
 T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:34239)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>)
 id 1cmfth-0007kX-Hy
 for submit <at> debbugs.gnu.org; Sat, 11 Mar 2017 07:13:21 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46481)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cmftg-0000FE-Ao
 for bug-guile@HIDDEN; Sat, 11 Mar 2017 07:13:21 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <taylanbayirli@HIDDEN>) id 1cmftf-0007iA-7V
 for bug-guile@HIDDEN; Sat, 11 Mar 2017 07:13:20 -0500
Received: from mail-wr0-x234.google.com ([2a00:1450:400c:c0c::234]:35738)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <taylanbayirli@HIDDEN>)
 id 1cmftf-0007gz-0l
 for bug-guile@HIDDEN; Sat, 11 Mar 2017 07:13:19 -0500
Received: by mail-wr0-x234.google.com with SMTP id g10so78931385wrg.2
 for <bug-guile@HIDDEN>; Sat, 11 Mar 2017 04:13:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=date:message-id:from:to:subject:mime-version
 :content-transfer-encoding;
 bh=m9U8N1JQh4Fx8RBEIkzEelu/URJqTEIohIBFrGLEFqk=;
 b=VUX3ogASgCy6N+POTpau6n/eJ3x0WxtZORO9b2zoTvgyyKwgQXdmyrDWAX6N1Uog7R
 4+wjDA7TG6AVbEH3NamKFA9TmC8WXbrDAso3e3w1yy0ITAiLPsSpIAwo/hcGc5ldVdH3
 RwyX6Ljl6xyQdOb2snrfkacyBbSXOPKq2ynSM9nnyHE0Z0ln4efmtigmD4LitEqbOKNp
 bJWMyBkWygSOQ0DRadr/eTUsDa39VS5EXuTHo3a1VzkhYu56pXJobGvXafobzKdg86rz
 nAmroiTeBIIVc+kP6QPFVsSwJOmWoMaRCRMcve/XUoKFlrSlHTjF8bL2W0KXSEeKMtmi
 bWSw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:message-id:from:to:subject:mime-version
 :content-transfer-encoding;
 bh=m9U8N1JQh4Fx8RBEIkzEelu/URJqTEIohIBFrGLEFqk=;
 b=bToNU7aqY0dpw7OrdrYwHqlmxJTOC8oMszev0bzUCT97lOLDN3Jf0BCgLx5LBRK5by
 8ekMkA1xnemprWNlrty+KNVedjjJmoOsn2EuHcZ22krDJSaYihXE7WD21tgcjuRUS1sh
 WfuShcF82SMMpU9FpyvqDuir4wiYcg0Phk8T4Yy0sqnGC5XndbFz5Xgp1VkxaFMO8SiX
 ABWumPZlMjlnukq6vJJi4CE6wuOEGjMcfSq9GrQfEP4/MynEgqQQsDFlyV93AJg7Nhoe
 VD4LNIegkJqyg+xOGZmSdtKmgpA6+Rp/0wiOSAZu5Ew6mhb9CX+2XMyrBus22ghl9bbq
 lwBg==
X-Gm-Message-State: AMke39mQ6SVRrIM2zksZBIohMlovT88jWsNeShsWzFpcNCkMxNEj0L4ovFiAhMGJ6FYmnQ==
X-Received: by 10.223.128.5 with SMTP id 5mr19465736wrk.163.1489234397837;
 Sat, 11 Mar 2017 04:13:17 -0800 (PST)
Received: from T420 ([2a02:908:c30:3540:221:ccff:fe66:68f0])
 by smtp.gmail.com with ESMTPSA id d42sm17161980wrd.37.2017.03.11.04.13.17
 for <bug-guile@HIDDEN>
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Sat, 11 Mar 2017 04:13:17 -0800 (PST)
Date: Sat, 11 Mar 2017 13:19:44 +0100
Message-Id: <87o9x83t0f.fsf@HIDDEN>
From: taylanbayirli@HIDDEN ("Taylan Ulrich =?utf-8?Q?Bay=C4=B1rl=C4=B1?=
 =?utf-8?Q?=2FKammer=22?=)
To: bug-guile@HIDDEN
Subject: utf16->string and utf32->string don't conform to R6RS
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

See the R6RS Libraries document page 10.  The differences:

- R6RS supports reading a BOM.

- R6RS mandates an endianness argument to specify the behavior at the
  absence of a BOM.

- R6RS allows an optional third argument 'endianness-mandatory' to
  explicitly ignore any possible BOM.

Here's a quick patch on top of master.  I didn't test it thoroughly...


===File
/home/taylan/src/guile/guile-master/0001-Fix-R6RS-utf16-string-and-utf32-string.patch===
From f51cd1d4884caafb1ed0072cd77c0e3145f34576 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?=
 <taylanbayirli@HIDDEN>
Date: Fri, 10 Mar 2017 22:36:55 +0100
Subject: [PATCH] Fix R6RS utf16->string and utf32->string.

* module/rnrs/bytevectors.scm (read-bom16, read-bom32): New procedures.
(r6rs-utf16->string, r6rs-utf32->string): Ditto.
---
 module/rnrs/bytevectors.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/module/rnrs/bytevectors.scm b/module/rnrs/bytevectors.scm
index 9744359f0..997a8c9cb 100644
--- a/module/rnrs/bytevectors.scm
+++ b/module/rnrs/bytevectors.scm
@@ -69,7 +69,9 @@
            bytevector-ieee-double-native-set!
 
            string->utf8 string->utf16 string->utf32
-           utf8->string utf16->string utf32->string))
+           utf8->string
+           (r6rs-utf16->string . utf16->string)
+           (r6rs-utf32->string . utf32->string)))
 
 
 (load-extension (string-append "libguile-" (effective-version))
@@ -80,4 +82,52 @@
       `(quote ,sym)
       (error "unsupported endianness" sym)))
 
+(define (read-bom16 bv)
+  (let ((c0 (bytevector-u8-ref bv 0))
+        (c1 (bytevector-u8-ref bv 1)))
+    (cond
+     ((and (= c0 #xFE) (= c1 #xFF))
+      'big)
+     ((and (= c0 #xFF) (= c1 #xFE))
+      'little)
+     (else
+      #f))))
+
+(define r6rs-utf16->string
+  (case-lambda
+    ((bv default-endianness)
+     (let ((bom-endianness (read-bom16 bv)))
+       (if (not bom-endianness)
+           (utf16->string bv default-endianness)
+           (substring/shared (utf16->string bv bom-endianness) 1))))
+    ((bv endianness endianness-mandatory?)
+     (if endianness-mandatory?
+         (utf16->string bv endianness)
+         (r6rs-utf16->string bv endianness)))))
+
+(define (read-bom32 bv)
+  (let ((c0 (bytevector-u8-ref bv 0))
+        (c1 (bytevector-u8-ref bv 1))
+        (c2 (bytevector-u8-ref bv 2))
+        (c3 (bytevector-u8-ref bv 3)))
+    (cond
+     ((and (= c0 #x00) (= c1 #x00) (= c2 #xFE) (= c3 #xFF))
+      'big)
+     ((and (= c0 #xFF) (= c1 #xFE) (= c2 #x00) (= c3 #x00))
+      'little)
+     (else
+      #f))))
+
+(define r6rs-utf32->string
+  (case-lambda
+    ((bv default-endianness)
+     (let ((bom-endianness (read-bom32 bv)))
+       (if (not bom-endianness)
+           (utf32->string bv default-endianness)
+           (substring/shared (utf32->string bv bom-endianness) 1))))
+    ((bv endianness endianness-mandatory?)
+     (if endianness-mandatory?
+         (utf32->string bv endianness)
+         (r6rs-utf32->string bv endianness)))))
+
 ;;; bytevector.scm ends here
-- 
2.11.0

============================================================




Acknowledgement sent to taylanbayirli@HIDDEN ("Taylan Ulrich Bayırlı/Kammer"):
New bug report received and forwarded. Copy sent to bug-guile@HIDDEN. Full text available.
Report forwarded to bug-guile@HIDDEN:
bug#26058; Package guile. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.