GNU bug report logs - #30066
'get-bytevector-some' returns only 1 byte from unbuffered ports

Previous Next

Package: guile;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Wed, 10 Jan 2018 15:03:02 UTC

Severity: normal

Tags: notabug

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 30066 in the body.
You can then email your comments to 30066 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Wed, 10 Jan 2018 15:03:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to ludo <at> gnu.org (Ludovic Courtès):
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Wed, 10 Jan 2018 15:03:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: bug-guile <at> gnu.org
Cc: Andy Wingo <wingo <at> igalia.com>
Subject: 'get-bytevector-some' returns only 1 byte from unbuffered ports
Date: Wed, 10 Jan 2018 16:02:10 +0100
As discussed on IRC, ‘get-bytevector-some’ returns only 1 byte from
unbuffered ports:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (call-with-input-string "foo"
		       (lambda (port)
			 (setvbuf port _IONBF)
			 (get-bytevector-some port)))
$11 = #vu8(102)
scheme@(guile-user)> (version)
$12 = "2.2.3"
--8<---------------cut here---------------end--------------->8---

Strictly speaking it’s valid, but in practice it’s not very useful.

AFAICS, we lack a way to do the equivalent of:

  read (fd, buf, sizeof buf);

‘get-bytevector-n!’ is different because it blocks until it has read
COUNT bytes or EOF is reached.  So ‘get-bytevector-some’ could play this
role, but it doesn’t.

Thoughts?

Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Wed, 10 Jan 2018 16:00:02 GMT) Full text and rfc822 format available.

Message #8 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: 30066 <at> debbugs.gnu.org
Cc: Andy Wingo <wingo <at> igalia.com>
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Wed, 10 Jan 2018 16:59:29 +0100
[Message part 1 (text/plain, inline)]
ludo <at> gnu.org (Ludovic Courtès) skribis:

> As discussed on IRC, ‘get-bytevector-some’ returns only 1 byte from
> unbuffered ports:

Here’s a tentative fix.  WDYT?

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/libguile/ports.c b/libguile/ports.c
index 72bb73a01..002dd1433 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995-2001, 2003-2004, 2006-2017
+/* Copyright (C) 1995-2001, 2003-2004, 2006-2018
  * Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
@@ -1543,7 +1543,9 @@ scm_peek_byte_or_eof (SCM port)
   return peek_byte_or_eof (port, &buf, &cur);
 }
 
-static size_t
+/* Like read(2), read *up to* COUNT bytes from PORT into DST, starting
+   at OFFSET.  Return 0 upon EOF.  */
+size_t
 scm_i_read_bytes (SCM port, SCM dst, size_t start, size_t count)
 {
   size_t filled;
diff --git a/libguile/ports.h b/libguile/ports.h
index d131db5be..7aeacc8f9 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -4,7 +4,7 @@
 #define SCM_PORTS_H
 
 /* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004,
- *   2006, 2008, 2009, 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc.
+ *   2006, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2018 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -69,6 +69,7 @@ SCM_INTERNAL SCM scm_i_port_weak_set;
 #define SCM_OPOUTPORTP(x) (SCM_OPPORTP (x) && SCM_OUTPUT_PORT_P (x))
 #define SCM_OPENP(x) (SCM_OPPORTP (x))
 #define SCM_CLOSEDP(x) (!SCM_OPENP (x))
+#define SCM_UNBUFFEREDP(x) (SCM_PORTP (x) && (SCM_CELL_WORD_0 (x) & SCM_BUF0))
 #define SCM_CLR_PORT_OPEN_FLAG(p) \
   SCM_SET_CELL_WORD_0 ((p), SCM_CELL_WORD_0 (p) & ~SCM_OPN)
 #ifdef BUILDING_LIBGUILE
@@ -185,6 +186,8 @@ SCM_API int scm_get_byte_or_eof (SCM port);
 SCM_API int scm_peek_byte_or_eof (SCM port);
 SCM_API size_t scm_c_read (SCM port, void *buffer, size_t size);
 SCM_API size_t scm_c_read_bytes (SCM port, SCM dst, size_t start, size_t count);
+SCM_INTERNAL size_t scm_i_read_bytes (SCM port, SCM dst, size_t start,
+				      size_t count);
 SCM_API scm_t_wchar scm_getc (SCM port);
 SCM_API SCM scm_read_char (SCM port);
 
diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
index e944c7aab..a3a67f3ca 100644
--- a/libguile/r6rs-ports.c
+++ b/libguile/r6rs-ports.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2009, 2010, 2011, 2013-2015 Free Software Foundation, Inc.
+/* Copyright (C) 2009, 2010, 2011, 2013-2015, 2018 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -487,16 +487,33 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
 
   SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
 
-  buf = scm_fill_input (port, 0, &cur, &avail);
-  if (avail == 0)
+  if (SCM_UNBUFFEREDP (port))
     {
-      scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
-      return SCM_EOF_VAL;
+      size_t read;
+
+      bv = scm_c_make_bytevector (4096);
+      read = scm_i_read_bytes (port, bv, 0, SCM_BYTEVECTOR_LENGTH (bv));
+
+      if (read == 0)
+	return SCM_EOF_VAL;
+      else if (read < SCM_BYTEVECTOR_LENGTH (bv))
+	return scm_c_shrink_bytevector (bv, read);
+      else
+	return bv;
     }
+  else
+    {
+      buf = scm_fill_input (port, 0, &cur, &avail);
+      if (avail == 0)
+	{
+	  scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
+	  return SCM_EOF_VAL;
+	}
 
-  bv = scm_c_make_bytevector (avail);
-  scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
-                        avail, cur, avail);
+      bv = scm_c_make_bytevector (avail);
+      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
+			    avail, cur, avail);
+    }
 
   return bv;
 }
diff --git a/test-suite/tests/r6rs-ports.test b/test-suite/tests/r6rs-ports.test
index ba3131f2e..7450b7217 100644
--- a/test-suite/tests/r6rs-ports.test
+++ b/test-suite/tests/r6rs-ports.test
@@ -1,6 +1,6 @@
 ;;;; r6rs-ports.test --- R6RS I/O port tests.   -*- coding: utf-8; -*-
 ;;;;
-;;;; Copyright (C) 2009-2012, 2013-2015 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009-2012, 2013-2015, 2018 Free Software Foundation, Inc.
 ;;;; Ludovic Courtès
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
@@ -183,6 +183,15 @@
            (equal? (bytevector->u8-list bv)
                    (map char->integer (string->list str))))))
 
+  (pass-if-equal "get-bytevector-some [unbuffered port]"
+      (string->utf8 "Hello, world!")
+    ;; 'get-bytevector-some' used to return a single byte, see
+    ;; <https://bugs.gnu.org/30066>.
+    (call-with-input-string "Hello, world!"
+      (lambda (port)
+        (setvbuf port _IONBF)
+        (get-bytevector-some port))))
+
   (pass-if "get-bytevector-all"
     (let* ((str   "GNU Guile")
            (index 0)

Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Wed, 10 Jan 2018 16:33:01 GMT) Full text and rfc822 format available.

Message #11 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> igalia.com>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Wed, 10 Jan 2018 17:32:04 +0100
On Wed 10 Jan 2018 16:59, ludo <at> gnu.org (Ludovic Courtès) writes:

> ludo <at> gnu.org (Ludovic Courtès) skribis:
>
>> As discussed on IRC, ‘get-bytevector-some’ returns only 1 byte from
>> unbuffered ports:
>
> Here’s a tentative fix.  WDYT?

Thanks!  Needs a little work though :)  Comments inline.

> --- a/libguile/ports.h
> +++ b/libguile/ports.h
> @@ -69,6 +69,7 @@ SCM_INTERNAL SCM scm_i_port_weak_set;
>  #define SCM_OPOUTPORTP(x) (SCM_OPPORTP (x) && SCM_OUTPUT_PORT_P (x))
>  #define SCM_OPENP(x) (SCM_OPPORTP (x))
>  #define SCM_CLOSEDP(x) (!SCM_OPENP (x))
> +#define SCM_UNBUFFEREDP(x) (SCM_PORTP (x) && (SCM_CELL_WORD_0 (x) & SCM_BUF0))
>  #define SCM_CLR_PORT_OPEN_FLAG(p) \
>    SCM_SET_CELL_WORD_0 ((p), SCM_CELL_WORD_0 (p) & ~SCM_OPN)
>  #ifdef BUILDING_LIBGUILE

Please guard this under #ifdef BUILDING_LIBGUILE.

> @@ -487,16 +487,33 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
>  
>    SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
>  
> -  buf = scm_fill_input (port, 0, &cur, &avail);
> -  if (avail == 0)
> +  if (SCM_UNBUFFEREDP (port))
>      {
> -      scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
> -      return SCM_EOF_VAL;
> +      size_t read;
> +
> +      bv = scm_c_make_bytevector (4096);
> +      read = scm_i_read_bytes (port, bv, 0, SCM_BYTEVECTOR_LENGTH (bv));
> +
> +      if (read == 0)
> +	return SCM_EOF_VAL;
> +      else if (read < SCM_BYTEVECTOR_LENGTH (bv))
> +	return scm_c_shrink_bytevector (bv, read);
> +      else
> +	return bv;
>      }
> +  else
> +    {
> +      buf = scm_fill_input (port, 0, &cur, &avail);
> +      if (avail == 0)
> +	{
> +	  scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
> +	  return SCM_EOF_VAL;
> +	}
>  
> -  bv = scm_c_make_bytevector (avail);
> -  scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
> -                        avail, cur, avail);
> +      bv = scm_c_make_bytevector (avail);
> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
> +			    avail, cur, avail);
> +    }
>  
>    return bv;
>  }

There are tabs in your code; would you mind doing only spaces?

A port being unbuffered doesn't mean that it has no bytes in its
buffer.  In particular, scm_unget_bytes may put bytes back into the
buffer.  Or, peek-u8 might fill this buffer with one byte.

Also, they port may have buffered write bytes (could be the port has
write buffering but no read buffering).  In that case (pt->rw_random)
you need to scm_flush().

I suggest taking the buffered bytes from the read buffer, if any.  Then
if the port is unbuffered, make a bytevector and call scm_i_read_bytes;
otherwise do the scm_fill_input path that's there already.

One more thing, if the port goes EOF, you need to
scm_port_buffer_set_has_eof_p.

Regards,

Andy




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Wed, 10 Jan 2018 16:59:01 GMT) Full text and rfc822 format available.

Message #14 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Nala Ginrut <nalaginrut <at> gmail.com>
To: Andy Wingo <wingo <at> igalia.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Thu, 11 Jan 2018 00:58:31 +0800
hi Andy and Ludo!

What if developers enabled suspendable-ports and set the port to non-blocking?
For example, in the non-blocking asynchronous server, I registered
read/write waiter for suspendable-ports. And save
delimited-continuations then yield the current task.
In this situation, get-bytevector-n! will read n bytes with several
times yielding by the registered read-writer, from the caller's
perspective, get-bytevector-n! will return n bytes finally no matter
how many times it's yielded.
But how about the get-bytevector-some? Should it block just once and
return the first time read m bytes then return?

Thanks!


On Thu, Jan 11, 2018 at 12:32 AM, Andy Wingo <wingo <at> igalia.com> wrote:
> On Wed 10 Jan 2018 16:59, ludo <at> gnu.org (Ludovic Courtès) writes:
>
>> ludo <at> gnu.org (Ludovic Courtès) skribis:
>>
>>> As discussed on IRC, ‘get-bytevector-some’ returns only 1 byte from
>>> unbuffered ports:
>>
>> Here’s a tentative fix.  WDYT?
>
> Thanks!  Needs a little work though :)  Comments inline.
>
>> --- a/libguile/ports.h
>> +++ b/libguile/ports.h
>> @@ -69,6 +69,7 @@ SCM_INTERNAL SCM scm_i_port_weak_set;
>>  #define SCM_OPOUTPORTP(x) (SCM_OPPORTP (x) && SCM_OUTPUT_PORT_P (x))
>>  #define SCM_OPENP(x) (SCM_OPPORTP (x))
>>  #define SCM_CLOSEDP(x) (!SCM_OPENP (x))
>> +#define SCM_UNBUFFEREDP(x) (SCM_PORTP (x) && (SCM_CELL_WORD_0 (x) & SCM_BUF0))
>>  #define SCM_CLR_PORT_OPEN_FLAG(p) \
>>    SCM_SET_CELL_WORD_0 ((p), SCM_CELL_WORD_0 (p) & ~SCM_OPN)
>>  #ifdef BUILDING_LIBGUILE
>
> Please guard this under #ifdef BUILDING_LIBGUILE.
>
>> @@ -487,16 +487,33 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
>>
>>    SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
>>
>> -  buf = scm_fill_input (port, 0, &cur, &avail);
>> -  if (avail == 0)
>> +  if (SCM_UNBUFFEREDP (port))
>>      {
>> -      scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
>> -      return SCM_EOF_VAL;
>> +      size_t read;
>> +
>> +      bv = scm_c_make_bytevector (4096);
>> +      read = scm_i_read_bytes (port, bv, 0, SCM_BYTEVECTOR_LENGTH (bv));
>> +
>> +      if (read == 0)
>> +     return SCM_EOF_VAL;
>> +      else if (read < SCM_BYTEVECTOR_LENGTH (bv))
>> +     return scm_c_shrink_bytevector (bv, read);
>> +      else
>> +     return bv;
>>      }
>> +  else
>> +    {
>> +      buf = scm_fill_input (port, 0, &cur, &avail);
>> +      if (avail == 0)
>> +     {
>> +       scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
>> +       return SCM_EOF_VAL;
>> +     }
>>
>> -  bv = scm_c_make_bytevector (avail);
>> -  scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
>> -                        avail, cur, avail);
>> +      bv = scm_c_make_bytevector (avail);
>> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
>> +                         avail, cur, avail);
>> +    }
>>
>>    return bv;
>>  }
>
> There are tabs in your code; would you mind doing only spaces?
>
> A port being unbuffered doesn't mean that it has no bytes in its
> buffer.  In particular, scm_unget_bytes may put bytes back into the
> buffer.  Or, peek-u8 might fill this buffer with one byte.
>
> Also, they port may have buffered write bytes (could be the port has
> write buffering but no read buffering).  In that case (pt->rw_random)
> you need to scm_flush().
>
> I suggest taking the buffered bytes from the read buffer, if any.  Then
> if the port is unbuffered, make a bytevector and call scm_i_read_bytes;
> otherwise do the scm_fill_input path that's there already.
>
> One more thing, if the port goes EOF, you need to
> scm_port_buffer_set_has_eof_p.
>
> Regards,
>
> Andy
>
>
>




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Wed, 10 Jan 2018 17:27:02 GMT) Full text and rfc822 format available.

Message #17 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> igalia.com>
To: Nala Ginrut <nalaginrut <at> gmail.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Wed, 10 Jan 2018 18:26:06 +0100
On Wed 10 Jan 2018 17:58, Nala Ginrut <nalaginrut <at> gmail.com> writes:

> hi Andy and Ludo!
>
> What if developers enabled suspendable-ports and set the port to non-blocking?
> For example, in the non-blocking asynchronous server, I registered
> read/write waiter for suspendable-ports. And save
> delimited-continuations then yield the current task.
> In this situation, get-bytevector-n! will read n bytes with several
> times yielding by the registered read-writer, from the caller's
> perspective, get-bytevector-n! will return n bytes finally no matter
> how many times it's yielded.
> But how about the get-bytevector-some? Should it block just once and
> return the first time read m bytes then return?

I think this is right.  At most one block.  FWIW we'd need to add
support for get-bytevector-some to (ice-9 suspendable-ports) to get this
to work.

Andy




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Wed, 10 Jan 2018 17:44:01 GMT) Full text and rfc822 format available.

Message #20 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Nala Ginrut <nalaginrut <at> gmail.com>
To: Andy Wingo <wingo <at> igalia.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Thu, 11 Jan 2018 01:43:34 +0800
Ah, thanks for that work!

On Thu, Jan 11, 2018 at 1:26 AM, Andy Wingo <wingo <at> igalia.com> wrote:
> On Wed 10 Jan 2018 17:58, Nala Ginrut <nalaginrut <at> gmail.com> writes:
>
>> hi Andy and Ludo!
>>
>> What if developers enabled suspendable-ports and set the port to non-blocking?
>> For example, in the non-blocking asynchronous server, I registered
>> read/write waiter for suspendable-ports. And save
>> delimited-continuations then yield the current task.
>> In this situation, get-bytevector-n! will read n bytes with several
>> times yielding by the registered read-writer, from the caller's
>> perspective, get-bytevector-n! will return n bytes finally no matter
>> how many times it's yielded.
>> But how about the get-bytevector-some? Should it block just once and
>> return the first time read m bytes then return?
>
> I think this is right.  At most one block.  FWIW we'd need to add
> support for get-bytevector-some to (ice-9 suspendable-ports) to get this
> to work.
>
> Andy




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Thu, 11 Jan 2018 14:35:02 GMT) Full text and rfc822 format available.

Message #23 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Andy Wingo <wingo <at> igalia.com>
Cc: 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Thu, 11 Jan 2018 15:34:17 +0100
[Message part 1 (text/plain, inline)]
Hello,

Andy Wingo <wingo <at> igalia.com> skribis:

> There are tabs in your code; would you mind doing only spaces?
>
> A port being unbuffered doesn't mean that it has no bytes in its
> buffer.  In particular, scm_unget_bytes may put bytes back into the
> buffer.  Or, peek-u8 might fill this buffer with one byte.
>
> Also, they port may have buffered write bytes (could be the port has
> write buffering but no read buffering).  In that case (pt->rw_random)
> you need to scm_flush().
>
> I suggest taking the buffered bytes from the read buffer, if any.  Then
> if the port is unbuffered, make a bytevector and call scm_i_read_bytes;
> otherwise do the scm_fill_input path that's there already.
>
> One more thing, if the port goes EOF, you need to
> scm_port_buffer_set_has_eof_p.

I think the attached patch addresses these issues.  WDYT?

Thanks for the review!

Ludo’.

[0001-get-bytevector-some-reads-as-much-as-possible-withou.patch (text/x-patch, inline)]
From d3a60bac6c6aae62ced6eec21b3865caaab83bb8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo <at> gnu.org>
Date: Thu, 11 Jan 2018 15:29:55 +0100
Subject: [PATCH] 'get-bytevector-some' reads as much as possible without
 blocking.

Fixes <https://bugs.gnu.org/30066>.

* libguile/ports.c (scm_i_read_bytes): Remove 'static' keyword.
* libguile/ports.h (SCM_UNBUFFEREDP): New macro.
(scm_i_read_bytes): New declaration.
* libguile/r6rs-ports.c (scm_get_bytevector_some): When PORT is
unbuffered, invoke 'scm_i_read_bytes' to read as much as we can.
* test-suite/tests/r6rs-ports.test ("8.2.8 Binary Input")
["get-bytevector-some [unbuffered port]"]
["get-bytevector-some [unbuffered port, lookahead-u8]"]
["get-bytevector-some [unbuffered port, unget-bytevector]"]: New tests.
---
 libguile/ports.c                 |  6 ++++--
 libguile/ports.h                 |  7 ++++++-
 libguile/r6rs-ports.c            | 34 ++++++++++++++++++++++++++++------
 test-suite/tests/r6rs-ports.test | 36 ++++++++++++++++++++++++++++++++++--
 4 files changed, 72 insertions(+), 11 deletions(-)

diff --git a/libguile/ports.c b/libguile/ports.c
index 72bb73a01..002dd1433 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995-2001, 2003-2004, 2006-2017
+/* Copyright (C) 1995-2001, 2003-2004, 2006-2018
  * Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
@@ -1543,7 +1543,9 @@ scm_peek_byte_or_eof (SCM port)
   return peek_byte_or_eof (port, &buf, &cur);
 }
 
-static size_t
+/* Like read(2), read *up to* COUNT bytes from PORT into DST, starting
+   at OFFSET.  Return 0 upon EOF.  */
+size_t
 scm_i_read_bytes (SCM port, SCM dst, size_t start, size_t count)
 {
   size_t filled;
diff --git a/libguile/ports.h b/libguile/ports.h
index d131db5be..3fe64c27d 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -4,7 +4,7 @@
 #define SCM_PORTS_H
 
 /* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004,
- *   2006, 2008, 2009, 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc.
+ *   2006, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2018 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -71,7 +71,10 @@ SCM_INTERNAL SCM scm_i_port_weak_set;
 #define SCM_CLOSEDP(x) (!SCM_OPENP (x))
 #define SCM_CLR_PORT_OPEN_FLAG(p) \
   SCM_SET_CELL_WORD_0 ((p), SCM_CELL_WORD_0 (p) & ~SCM_OPN)
+
 #ifdef BUILDING_LIBGUILE
+#define SCM_UNBUFFEREDP(x)				\
+  (SCM_PORTP (x) && (SCM_CELL_WORD_0 (x) & SCM_BUF0))
 #define SCM_PORT_FINALIZING_P(x) \
   (SCM_CELL_WORD_0 (x) & SCM_F_PORT_FINALIZING)
 #define SCM_SET_PORT_FINALIZING(p) \
@@ -185,6 +188,8 @@ SCM_API int scm_get_byte_or_eof (SCM port);
 SCM_API int scm_peek_byte_or_eof (SCM port);
 SCM_API size_t scm_c_read (SCM port, void *buffer, size_t size);
 SCM_API size_t scm_c_read_bytes (SCM port, SCM dst, size_t start, size_t count);
+SCM_INTERNAL size_t scm_i_read_bytes (SCM port, SCM dst, size_t start,
+				      size_t count);
 SCM_API scm_t_wchar scm_getc (SCM port);
 SCM_API SCM scm_read_char (SCM port);
 
diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
index e944c7aab..a3d638ca0 100644
--- a/libguile/r6rs-ports.c
+++ b/libguile/r6rs-ports.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2009, 2010, 2011, 2013-2015 Free Software Foundation, Inc.
+/* Copyright (C) 2009, 2010, 2011, 2013-2015, 2018 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -481,9 +481,9 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
             "position to point just past these bytes.")
 #define FUNC_NAME s_scm_get_bytevector_some
 {
-  SCM buf;
+  SCM buf, bv;
   size_t cur, avail;
-  SCM bv;
+  const size_t max_buffer_size = 4096;
 
   SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
 
@@ -494,9 +494,31 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
       return SCM_EOF_VAL;
     }
 
-  bv = scm_c_make_bytevector (avail);
-  scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
-                        avail, cur, avail);
+  if (SCM_UNBUFFEREDP (port) && (avail < max_buffer_size))
+    {
+      /* PORT is unbuffered.  Read as much as possible from PORT.  */
+      size_t read;
+
+      bv = scm_c_make_bytevector (max_buffer_size);
+      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
+                            avail, cur, avail);
+
+      read = scm_i_read_bytes (port, bv, avail,
+                               SCM_BYTEVECTOR_LENGTH (bv) - avail);
+
+      if (read == 0)
+        scm_port_buffer_set_has_eof_p (buf, SCM_BOOL_F);
+
+      if (read + avail < SCM_BYTEVECTOR_LENGTH (bv))
+        bv = scm_c_shrink_bytevector (bv, read + avail);
+    }
+  else
+    {
+      /* Return what's already buffered.  */
+      bv = scm_c_make_bytevector (avail);
+      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
+                            avail, cur, avail);
+    }
 
   return bv;
 }
diff --git a/test-suite/tests/r6rs-ports.test b/test-suite/tests/r6rs-ports.test
index ba3131f2e..d5476f20e 100644
--- a/test-suite/tests/r6rs-ports.test
+++ b/test-suite/tests/r6rs-ports.test
@@ -1,6 +1,6 @@
 ;;;; r6rs-ports.test --- R6RS I/O port tests.   -*- coding: utf-8; -*-
 ;;;;
-;;;; Copyright (C) 2009-2012, 2013-2015 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009-2012, 2013-2015, 2018 Free Software Foundation, Inc.
 ;;;; Ludovic Courtès
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
@@ -26,7 +26,8 @@
   #:use-module (rnrs io ports)
   #:use-module (rnrs io simple)
   #:use-module (rnrs exceptions)
-  #:use-module (rnrs bytevectors))
+  #:use-module (rnrs bytevectors)
+  #:use-module ((ice-9 binary-ports) #:select (unget-bytevector)))
 
 (define-syntax pass-if-condition
   (syntax-rules ()
@@ -183,6 +184,37 @@
            (equal? (bytevector->u8-list bv)
                    (map char->integer (string->list str))))))
 
+  (pass-if-equal "get-bytevector-some [unbuffered port]"
+      (string->utf8 "Hello, world!")
+    ;; 'get-bytevector-some' used to return a single byte, see
+    ;; <https://bugs.gnu.org/30066>.
+    (call-with-input-string "Hello, world!"
+      (lambda (port)
+        (setvbuf port _IONBF)
+        (get-bytevector-some port))))
+
+  (pass-if-equal "get-bytevector-some [unbuffered port, lookahead-u8]"
+      (string->utf8 "Hello, world!")
+    (call-with-input-string "Hello, world!"
+      (lambda (port)
+        (setvbuf port _IONBF)
+
+        ;; 'lookahead-u8' fills in PORT's 1-byte buffer.  Yet,
+        ;; 'get-bytevector-some' should return the whole thing.
+        (and (eqv? (lookahead-u8 port) (char->integer #\H))
+             (get-bytevector-some port)))))
+
+  (pass-if-equal "get-bytevector-some [unbuffered port, unget-bytevector]"
+      (string->utf8 "Hello")
+    (call-with-input-string "Hello, world!"
+      (lambda (port)
+        (setvbuf port _IONBF)
+        ;; 'unget-bytevector' fills the putback buffer, and
+        ;; 'get-bytevector-some' should get data from there.
+        (unget-bytevector port (get-bytevector-all port)
+                          0 5)
+        (get-bytevector-some port))))
+
   (pass-if "get-bytevector-all"
     (let* ((str   "GNU Guile")
            (index 0)
-- 
2.15.1


Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Thu, 11 Jan 2018 19:58:02 GMT) Full text and rfc822 format available.

Message #26 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: Andy Wingo <wingo <at> igalia.com>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Thu, 11 Jan 2018 14:55:07 -0500
Hi Ludovic,

ludo <at> gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo <at> igalia.com> skribis:
>
>> I suggest taking the buffered bytes from the read buffer, if any.  Then
>> if the port is unbuffered, make a bytevector and call scm_i_read_bytes;
>> otherwise do the scm_fill_input path that's there already.
>>
>> One more thing, if the port goes EOF, you need to
>> scm_port_buffer_set_has_eof_p.
>
> I think the attached patch addresses these issues.  WDYT?

[...]

> diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
> index e944c7aab..a3d638ca0 100644
> --- a/libguile/r6rs-ports.c
> +++ b/libguile/r6rs-ports.c
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 2009, 2010, 2011, 2013-2015 Free Software Foundation, Inc.
> +/* Copyright (C) 2009, 2010, 2011, 2013-2015, 2018 Free Software Foundation, Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public License
> @@ -481,9 +481,9 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
>              "position to point just past these bytes.")
>  #define FUNC_NAME s_scm_get_bytevector_some
>  {
> -  SCM buf;
> +  SCM buf, bv;
>    size_t cur, avail;
> -  SCM bv;
> +  const size_t max_buffer_size = 4096;
>  
>    SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
>  
> @@ -494,9 +494,31 @@ SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
>        return SCM_EOF_VAL;
>      }
>  
> -  bv = scm_c_make_bytevector (avail);
> -  scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
> -                        avail, cur, avail);
> +  if (SCM_UNBUFFEREDP (port) && (avail < max_buffer_size))
> +    {
> +      /* PORT is unbuffered.  Read as much as possible from PORT.  */
> +      size_t read;
> +
> +      bv = scm_c_make_bytevector (max_buffer_size);
> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
> +                            avail, cur, avail);
> +
> +      read = scm_i_read_bytes (port, bv, avail,
> +                               SCM_BYTEVECTOR_LENGTH (bv) - avail);

Here's the R6RS specification for 'get-bytevector-some':

  "Reads from BINARY-INPUT-PORT, blocking as necessary, until bytes are
   available from BINARY-INPUT-PORT or until an end of file is reached.
   If bytes become available, 'get-bytevector-some' returns a freshly
   allocated bytevector containing the initial available bytes (at least
   one), and it updates BINARY-INPUT-PORT to point just past these
   bytes.  If no input bytes are seen before an end of file is reached,
   the end-of-file object is returned."

By my reading of this, we should block only if necessary to ensure that
we return at least one byte (or EOF).  In other words, if we can return
at least one byte (or EOF), then we must not block, which means that we
must not initiate another 'read'.

Out of curiosity, is there a reason why you're using an unbuffered port
in your use case?

       Mark




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Thu, 11 Jan 2018 21:03:01 GMT) Full text and rfc822 format available.

Message #29 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: Andy Wingo <wingo <at> igalia.com>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Thu, 11 Jan 2018 22:02:29 +0100
Hello,

Mark H Weaver <mhw <at> netris.org> skribis:

> ludo <at> gnu.org (Ludovic Courtès) writes:

[...]

>> +  if (SCM_UNBUFFEREDP (port) && (avail < max_buffer_size))
>> +    {
>> +      /* PORT is unbuffered.  Read as much as possible from PORT.  */
>> +      size_t read;
>> +
>> +      bv = scm_c_make_bytevector (max_buffer_size);
>> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
>> +                            avail, cur, avail);
>> +
>> +      read = scm_i_read_bytes (port, bv, avail,
>> +                               SCM_BYTEVECTOR_LENGTH (bv) - avail);
>
> Here's the R6RS specification for 'get-bytevector-some':
>
>   "Reads from BINARY-INPUT-PORT, blocking as necessary, until bytes are
>    available from BINARY-INPUT-PORT or until an end of file is reached.
>    If bytes become available, 'get-bytevector-some' returns a freshly
>    allocated bytevector containing the initial available bytes (at least
>    one), and it updates BINARY-INPUT-PORT to point just past these
>    bytes.  If no input bytes are seen before an end of file is reached,
>    the end-of-file object is returned."
>
> By my reading of this, we should block only if necessary to ensure that
> we return at least one byte (or EOF).  In other words, if we can return
> at least one byte (or EOF), then we must not block, which means that we
> must not initiate another 'read'.

Indeed.  So perhaps the condition above should be changed to:

  if (SCM_UNBUFFEREDP (port) && (avail == 0))

?

> Out of curiosity, is there a reason why you're using an unbuffered port
> in your use case?

It’s to implement redirect à la socat:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447

Thanks,
Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Thu, 11 Jan 2018 21:57:02 GMT) Full text and rfc822 format available.

Message #32 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: Andy Wingo <wingo <at> igalia.com>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Thu, 11 Jan 2018 16:55:38 -0500
ludo <at> gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw <at> netris.org> skribis:
>
>> ludo <at> gnu.org (Ludovic Courtès) writes:
>
> [...]
>
>>> +  if (SCM_UNBUFFEREDP (port) && (avail < max_buffer_size))
>>> +    {
>>> +      /* PORT is unbuffered.  Read as much as possible from PORT.  */
>>> +      size_t read;
>>> +
>>> +      bv = scm_c_make_bytevector (max_buffer_size);
>>> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
>>> +                            avail, cur, avail);
>>> +
>>> +      read = scm_i_read_bytes (port, bv, avail,
>>> +                               SCM_BYTEVECTOR_LENGTH (bv) - avail);
>>
>> Here's the R6RS specification for 'get-bytevector-some':
>>
>>   "Reads from BINARY-INPUT-PORT, blocking as necessary, until bytes are
>>    available from BINARY-INPUT-PORT or until an end of file is reached.
>>    If bytes become available, 'get-bytevector-some' returns a freshly
>>    allocated bytevector containing the initial available bytes (at least
>>    one), and it updates BINARY-INPUT-PORT to point just past these
>>    bytes.  If no input bytes are seen before an end of file is reached,
>>    the end-of-file object is returned."
>>
>> By my reading of this, we should block only if necessary to ensure that
>> we return at least one byte (or EOF).  In other words, if we can return
>> at least one byte (or EOF), then we must not block, which means that we
>> must not initiate another 'read'.
>
> Indeed.  So perhaps the condition above should be changed to:
>
>   if (SCM_UNBUFFEREDP (port) && (avail == 0))
>
> ?

That won't work, because the earlier call to 'scm_fill_input' will have
already initiated a 'read' if the buffer was empty.  The read buffer
size will determine the maximum number of bytes read, which will be 1 in
the case of an unbuffered port.  So, at the point of this condition,
'avail == 0' will occur only if EOF was encountered, in which case you
must return EOF without attempting another 'read'.

In order to avoid unnecessary blocking, there must be only one 'read'
call, and it must be initiated only if the buffer was already empty.

So, in order to accomplish your goal here, I don't see how you can use
'scm_fill_input', unless you temporarily increase the size of the read
buffer beforehand.

Instead, I think you need to first check if the read buffer contains any
bytes.  If so, empty the buffer and return them.  If the buffer is
empty, the next thing to check is 'scm_port_buffer_has_eof_p'.  If it's
set, then clear that flag and return EOF.

Otherwise, if the buffer is empty and 'scm_port_buffer_has_eof_p' is
false, then you must do what 'scm_fill_input' would have done, except
using your larger buffer instead of the port's internal read buffer.  In
particular, you must first switch the port to "reading" mode, flushing
the write buffer if 'rw_random' is set.

Also, I'd prefer to move this code to ports.c in order to avoid adding
more internal declarations to ports.h and changing more functions from
'static' to global functions.

>> Out of curiosity, is there a reason why you're using an unbuffered port
>> in your use case?
>
> It’s to implement redirect à la socat:
>
>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447

Why is an unbuffered port being used here?  Can we change it to a
buffered port?

      Mark




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Fri, 12 Jan 2018 09:02:01 GMT) Full text and rfc822 format available.

Message #35 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> igalia.com>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Fri, 12 Jan 2018 10:01:11 +0100
On Thu 11 Jan 2018 22:55, Mark H Weaver <mhw <at> netris.org> writes:

> ludo <at> gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw <at> netris.org> skribis:
>>
>>> ludo <at> gnu.org (Ludovic Courtès) writes:
>>
>> [...]
>>
>>>> +  if (SCM_UNBUFFEREDP (port) && (avail < max_buffer_size))
>>>> +    {
>>>> +      /* PORT is unbuffered.  Read as much as possible from PORT.  */
>>>> +      size_t read;
>>>> +
>>>> +      bv = scm_c_make_bytevector (max_buffer_size);
>>>> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
>>>> +                            avail, cur, avail);
>>>> +
>>>> +      read = scm_i_read_bytes (port, bv, avail,
>>>> +                               SCM_BYTEVECTOR_LENGTH (bv) - avail);
>>>
>>> Here's the R6RS specification for 'get-bytevector-some':
>>>
>>>   "Reads from BINARY-INPUT-PORT, blocking as necessary, until bytes are
>>>    available from BINARY-INPUT-PORT or until an end of file is reached.
>>>    If bytes become available, 'get-bytevector-some' returns a freshly
>>>    allocated bytevector containing the initial available bytes (at least
>>>    one), and it updates BINARY-INPUT-PORT to point just past these
>>>    bytes.  If no input bytes are seen before an end of file is reached,
>>>    the end-of-file object is returned."
>>>
>>> By my reading of this, we should block only if necessary to ensure that
>>> we return at least one byte (or EOF).  In other words, if we can return
>>> at least one byte (or EOF), then we must not block, which means that we
>>> must not initiate another 'read'.
>>
>> Indeed.  So perhaps the condition above should be changed to:
>>
>>   if (SCM_UNBUFFEREDP (port) && (avail == 0))
>>
>> ?
>
> That won't work, because the earlier call to 'scm_fill_input' will have
> already initiated a 'read' if the buffer was empty.  The read buffer
> size will determine the maximum number of bytes read, which will be 1 in
> the case of an unbuffered port.  So, at the point of this condition,
> 'avail == 0' will occur only if EOF was encountered, in which case you
> must return EOF without attempting another 'read'.
>
> In order to avoid unnecessary blocking, there must be only one 'read'
> call, and it must be initiated only if the buffer was already empty.
>
> So, in order to accomplish your goal here, I don't see how you can use
> 'scm_fill_input', unless you temporarily increase the size of the read
> buffer beforehand.
>
> Instead, I think you need to first check if the read buffer contains any
> bytes.  If so, empty the buffer and return them.  If the buffer is
> empty, the next thing to check is 'scm_port_buffer_has_eof_p'.  If it's
> set, then clear that flag and return EOF.
>
> Otherwise, if the buffer is empty and 'scm_port_buffer_has_eof_p' is
> false, then you must do what 'scm_fill_input' would have done, except
> using your larger buffer instead of the port's internal read buffer.  In
> particular, you must first switch the port to "reading" mode, flushing
> the write buffer if 'rw_random' is set.
>
> Also, I'd prefer to move this code to ports.c in order to avoid adding
> more internal declarations to ports.h and changing more functions from
> 'static' to global functions.

I agree with Mark here -- thanks for the close review.

>>> Out of curiosity, is there a reason why you're using an unbuffered port
>>> in your use case?
>>
>> It’s to implement redirect à la socat:
>>
>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447
>
> Why is an unbuffered port being used here?  Can we change it to a
> buffered port?

This was also a question I had!  If you make it a buffered port at 4096
bytes (for example), then get-bytevector-some works exactly like you
want it to, no?

Andy




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Fri, 12 Jan 2018 10:16:01 GMT) Full text and rfc822 format available.

Message #38 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Andy Wingo <wingo <at> igalia.com>
Cc: Mark H Weaver <mhw <at> netris.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Fri, 12 Jan 2018 11:15:08 +0100
Andy Wingo <wingo <at> igalia.com> skribis:

> On Thu 11 Jan 2018 22:55, Mark H Weaver <mhw <at> netris.org> writes:

[...]

>>>> Out of curiosity, is there a reason why you're using an unbuffered port
>>>> in your use case?
>>>
>>> It’s to implement redirect à la socat:
>>>
>>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447
>>
>> Why is an unbuffered port being used here?  Can we change it to a
>> buffered port?
>
> This was also a question I had!  If you make it a buffered port at 4096
> bytes (for example), then get-bytevector-some works exactly like you
> want it to, no?

It might work, but that’s more by chance no?

I mean, if we declare the port as buffered, then we give the I/O
routines the “right” to fill in that buffer.

WDYT?

Thanks,
Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Fri, 12 Jan 2018 10:34:01 GMT) Full text and rfc822 format available.

Message #41 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> igalia.com>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: Mark H Weaver <mhw <at> netris.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Fri, 12 Jan 2018 11:33:32 +0100
On Fri 12 Jan 2018 11:15, ludo <at> gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo <at> igalia.com> skribis:
>
>> On Thu 11 Jan 2018 22:55, Mark H Weaver <mhw <at> netris.org> writes:
>
> [...]
>
>>>>> Out of curiosity, is there a reason why you're using an unbuffered port
>>>>> in your use case?
>>>>
>>>> It’s to implement redirect à la socat:
>>>>
>>>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447
>>>
>>> Why is an unbuffered port being used here?  Can we change it to a
>>> buffered port?
>>
>> This was also a question I had!  If you make it a buffered port at 4096
>> bytes (for example), then get-bytevector-some works exactly like you
>> want it to, no?
>
> It might work, but that’s more by chance no?

No, it is reliable.  get-bytevector-some on a buffered port must either
return all the buffered bytes or perform exactly one read (up to the
buffer size) and either return those bytes or EOF.  As far as I
understand, that is exactly what you want.

Using buffered ports has two additional advantages: you get to specify
the read size, and returned bytevectors can be allocated to precisely
the right size (no need to overallocate then truncate).

Andy




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Sat, 13 Jan 2018 20:54:01 GMT) Full text and rfc822 format available.

Message #44 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Andy Wingo <wingo <at> igalia.com>
Cc: Mark H Weaver <mhw <at> netris.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Sat, 13 Jan 2018 21:53:34 +0100
Hey,

Andy Wingo <wingo <at> igalia.com> skribis:

> On Fri 12 Jan 2018 11:15, ludo <at> gnu.org (Ludovic Courtès) writes:
>
>> Andy Wingo <wingo <at> igalia.com> skribis:
>>
>>> On Thu 11 Jan 2018 22:55, Mark H Weaver <mhw <at> netris.org> writes:
>>
>> [...]
>>
>>>>>> Out of curiosity, is there a reason why you're using an unbuffered port
>>>>>> in your use case?
>>>>>
>>>>> It’s to implement redirect à la socat:
>>>>>
>>>>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447
>>>>
>>>> Why is an unbuffered port being used here?  Can we change it to a
>>>> buffered port?
>>>
>>> This was also a question I had!  If you make it a buffered port at 4096
>>> bytes (for example), then get-bytevector-some works exactly like you
>>> want it to, no?
>>
>> It might work, but that’s more by chance no?
>
> No, it is reliable.  get-bytevector-some on a buffered port must either
> return all the buffered bytes or perform exactly one read (up to the
> buffer size) and either return those bytes or EOF.  As far as I
> understand, that is exactly what you want.

Indeed, that works well, thanks!  So, after all, problem solved?

I think the confusion for me comes from the fact that we don’t have a
FILE*/fd distinction like in C.  It’s as if we were always using FILE*
in the sense that I’m never sure what’s going to happen or whether a
particular behavior can be relied on.

Thank you,
Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#30066; Package guile. (Fri, 16 Feb 2018 13:20:01 GMT) Full text and rfc822 format available.

Message #47 received at 30066 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Andy Wingo <wingo <at> igalia.com>
Cc: Mark H Weaver <mhw <at> netris.org>, 30066 <at> debbugs.gnu.org
Subject: Re: bug#30066: 'get-bytevector-some' returns only 1 byte from
 unbuffered ports
Date: Fri, 16 Feb 2018 14:19:50 +0100
ludo <at> gnu.org (Ludovic Courtès) skribis:

> Andy Wingo <wingo <at> igalia.com> skribis:
>
>> On Fri 12 Jan 2018 11:15, ludo <at> gnu.org (Ludovic Courtès) writes:
>>
>>> Andy Wingo <wingo <at> igalia.com> skribis:
>>>
>>>> On Thu 11 Jan 2018 22:55, Mark H Weaver <mhw <at> netris.org> writes:
>>>
>>> [...]
>>>
>>>>>>> Out of curiosity, is there a reason why you're using an unbuffered port
>>>>>>> in your use case?
>>>>>>
>>>>>> It’s to implement redirect à la socat:
>>>>>>
>>>>>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447
>>>>>
>>>>> Why is an unbuffered port being used here?  Can we change it to a
>>>>> buffered port?
>>>>
>>>> This was also a question I had!  If you make it a buffered port at 4096
>>>> bytes (for example), then get-bytevector-some works exactly like you
>>>> want it to, no?
>>>
>>> It might work, but that’s more by chance no?
>>
>> No, it is reliable.  get-bytevector-some on a buffered port must either
>> return all the buffered bytes or perform exactly one read (up to the
>> buffer size) and either return those bytes or EOF.  As far as I
>> understand, that is exactly what you want.
>
> Indeed, that works well, thanks!  So, after all, problem solved?

I’m closing this as not-a-bug.

Ludo’.




Added tag(s) notabug. Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Fri, 16 Feb 2018 13:20:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 30066 <at> debbugs.gnu.org and ludo <at> gnu.org (Ludovic Courtès) Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Fri, 16 Feb 2018 13:20:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 17 Mar 2018 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 41 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.