GNU bug report logs - #67245
[PATCH] store: Use a non-blocking socket for store connections.

Previous Next

Package: guix-patches;

Reported by: Christopher Baines <mail <at> cbaines.net>

Date: Fri, 17 Nov 2023 18:06:01 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 67245 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, rekado <at> elephly.net, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#67245; Package guix-patches. (Fri, 17 Nov 2023 18:06:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christopher Baines <mail <at> cbaines.net>:
New bug report received and forwarded. Copy sent to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, rekado <at> elephly.net, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org. (Fri, 17 Nov 2023 18:06:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: guix-patches <at> gnu.org
Subject: [PATCH] store: Use a non-blocking socket for store connections.
Date: Fri, 17 Nov 2023 18:05:14 +0000
For some applications, it's important to do this here rather than just making
the socket non-blocking after the connection is established because there can
be I/O on the socket that will block during the handshake.

I've noticed this blocking during the handshake causing issues in the build
coordinator for example.

* guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
SOCK_NONBLOCK when calling socket.

Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf
---
 guix/store.scm | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/guix/store.scm b/guix/store.scm
index f8e77b2cd9..216be98c05 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
 '&store-connection-error' upon error."
   (let ((s (with-fluids ((%default-port-encoding #f))
              ;; This trick allows use of the `scm_c_read' optimization.
-             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
+             (socket PF_UNIX
+                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
+                     0)))
         (a (make-socket-address PF_UNIX file)))
 
     (system-error-to-connection-error file
@@ -488,7 +490,8 @@ (define (open-inet-socket host port)
       ((ai rest ...)
        (let ((s (socket (addrinfo:fam ai)
                         ;; TCP/IP only
-                        (logior SOCK_STREAM SOCK_CLOEXEC) IPPROTO_IP)))
+                        (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
+                        IPPROTO_IP)))
 
          (catch 'system-error
            (lambda ()

base-commit: e35b7c5386c1bfacf47ed31bac9b503373dd26fc
-- 
2.41.0





Information forwarded to guix-patches <at> gnu.org:
bug#67245; Package guix-patches. (Sun, 26 Nov 2023 22:18:01 GMT) Full text and rfc822 format available.

Message #8 received at 67245 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Ricardo Wurmus <rekado <at> elephly.net>,
 67245 <at> debbugs.gnu.org, Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#67245] [PATCH] store: Use a non-blocking socket for store
 connections.
Date: Sun, 26 Nov 2023 23:16:54 +0100
Hi Christopher,

Christopher Baines <mail <at> cbaines.net> skribis:

> For some applications, it's important to do this here rather than just making
> the socket non-blocking after the connection is established because there can
> be I/O on the socket that will block during the handshake.
>
> I've noticed this blocking during the handshake causing issues in the build
> coordinator for example.
>
> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
> SOCK_NONBLOCK when calling socket.
>
> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf

I feel we should really discuss on Guix + Fibers since we’ve apparently
been going through the exact same set of issues.  :-)

(The other thing that comes to mind is the resource pool!)

> +++ b/guix/store.scm
> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>  '&store-connection-error' upon error."
>    (let ((s (with-fluids ((%default-port-encoding #f))
>               ;; This trick allows use of the `scm_c_read' optimization.
> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
> +             (socket PF_UNIX
> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
> +                     0)))

We cannot do this here because callers have to be prepared to deal with
non-blocking sockets, and that’s not the case in Guix itself.

In Cuirass, I have this:

--8<---------------cut here---------------start------------->8---
(define (non-blocking-port port)
  "Make PORT non-blocking and return it."
  (let ((flags (fcntl port F_GETFL)))
    (when (zero? (logand O_NONBLOCK flags))
      (fcntl port F_SETFL (logior O_NONBLOCK flags)))
    port))

(define (ensure-non-blocking-store-connection store)
  "Mark the file descriptor that backs STORE, a <store-connection>, as
O_NONBLOCK."
  (match (store-connection-socket store)
    ((? file-port? port)
     (non-blocking-port port))
    (_ #f)))

(define-syntax-rule (with-store/non-blocking store exp ...)
  "Like 'with-store', bind STORE to a connection to the store, but ensure that
said connection is non-blocking (O_NONBLOCK).  Evaluate EXP... in that
context."
  (with-store store
    (ensure-non-blocking-store-connection store)
    (let ()
      exp ...)))
--8<---------------cut here---------------end--------------->8---

Then ‘with-store/non-blocking’ is used in fiberized context where I know
this is fine.

I think it’ll have to remain this way until Guix itself is fiberized or
something.

Does that make sense?

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#67245; Package guix-patches. (Mon, 27 Nov 2023 10:08:02 GMT) Full text and rfc822 format available.

Message #11 received at 67245 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 67245 <at> debbugs.gnu.org
Subject: Re: [bug#67245] [PATCH] store: Use a non-blocking socket for store
 connections.
Date: Mon, 27 Nov 2023 09:48:09 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi Christopher,
>
> Christopher Baines <mail <at> cbaines.net> skribis:
>
>> For some applications, it's important to do this here rather than just making
>> the socket non-blocking after the connection is established because there can
>> be I/O on the socket that will block during the handshake.
>>
>> I've noticed this blocking during the handshake causing issues in the build
>> coordinator for example.
>>
>> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
>> SOCK_NONBLOCK when calling socket.
>>
>> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf
>
> I feel we should really discuss on Guix + Fibers since we’ve apparently
> been going through the exact same set of issues.  :-)
>
> (The other thing that comes to mind is the resource pool!)

I'm mostly ignoring these issues then coping the code once you write it
:)

>> +++ b/guix/store.scm
>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>>  '&store-connection-error' upon error."
>>    (let ((s (with-fluids ((%default-port-encoding #f))
>>               ;; This trick allows use of the `scm_c_read' optimization.
>> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>> +             (socket PF_UNIX
>> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>> +                     0)))
>
> We cannot do this here because callers have to be prepared to deal with
> non-blocking sockets, and that’s not the case in Guix itself.

I can see potential problems for programs outside of Guix which use
suspendable ports, but given Guix doesn't use suspendable ports, this
won't change behaviour, right?

Obviously Guile will be working a bit differently, using poll when it
needs to wait for I/O, but at the scheme level within Guix, things
should be no different.

I tried guix weather with this change, and things seemed fine. Is there
a specific bit of Guix you're concerned about?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#67245; Package guix-patches. (Thu, 30 Nov 2023 21:13:01 GMT) Full text and rfc822 format available.

Message #14 received at 67245 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 67245 <at> debbugs.gnu.org
Subject: Re: [bug#67245] [PATCH] store: Use a non-blocking socket for store
 connections.
Date: Thu, 30 Nov 2023 22:11:58 +0100
Hi Chris,

Christopher Baines <mail <at> cbaines.net> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>> I feel we should really discuss on Guix + Fibers since we’ve apparently
>> been going through the exact same set of issues.  :-)
>>
>> (The other thing that comes to mind is the resource pool!)
>
> I'm mostly ignoring these issues then coping the code once you write it
> :)

Heh, so we’re already in sync maybe, not bad.  :-)

>>> +++ b/guix/store.scm
>>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>>>  '&store-connection-error' upon error."
>>>    (let ((s (with-fluids ((%default-port-encoding #f))
>>>               ;; This trick allows use of the `scm_c_read' optimization.
>>> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>>> +             (socket PF_UNIX
>>> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>>> +                     0)))
>>
>> We cannot do this here because callers have to be prepared to deal with
>> non-blocking sockets, and that’s not the case in Guix itself.
>
> I can see potential problems for programs outside of Guix which use
> suspendable ports, but given Guix doesn't use suspendable ports, this
> won't change behaviour, right?
>
> Obviously Guile will be working a bit differently, using poll when it
> needs to wait for I/O, but at the scheme level within Guix, things
> should be no different.

Hmm yes, I think you’re right.

One issue is if we hand over the file descriptor to something that’s not
Guile.  Off the top of my head, this happens with inferiors and in the
‘build’ procedure of ‘build-self.scm’ (well, the process that receives
that file descriptor is Guile, but if it’s older than 3.0 (?), then it
may behave differently.)

So it should be safe indeed, but adds a bit of overhead (hopping via
‘current-{read,write}-waiter’) and needs good testing.

Laziness gives an incentive for the status quo, but I’m not opposed to
the change if we get more confidence (test suite passing, tests with
inferiors and ‘time-machine’, and some more auditing.)

Ludo’.




This bug report was last modified 155 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.