GNU bug report logs - #77610
guix-daemon socket activation does not work on the hurd

Previous Next

Package: guix;

Reported by: yelninei <at> tutamail.com

Date: Mon, 7 Apr 2025 16:30:03 UTC

Severity: normal

To reply to this bug, email your comments to 77610 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Mon, 07 Apr 2025 16:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to yelninei <at> tutamail.com:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 07 Apr 2025 16:30:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: bug-guix <at> gnu.org
Subject: guix-daemon socket activation does not work on the hurd
Date: Mon, 7 Apr 2025 18:29:29 +0200 (CEST)
Hi,

today i reconfigured my system and after a reboot I am unable to use the guix-daemon on a childhurd.


guix build hello -n
guix build: error: failed to connect to `/var/guix/daemon-socket/socket': Protocol error

Offloading:
guix offload: error: failed to connect over SSH to daemon at 'localhost', socket /var/guix/daemon-socket/socket

Daemon Logs:
socket-activated with 1 socket
unexpected build daemon error: reading from file: Resource temporarily unavailable
Starting the daemon as the root user normally continues to work as before so i suspect the socket activation change is to blame.
Guix commit: 6af680670bf9055b90e6f8b63c4c2ab7b08e7c56




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 09 Apr 2025 10:30:03 GMT) Full text and rfc822 format available.

Message #8 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: 77610 <at> debbugs.gnu.org
Subject: Re: guix-daemon socket activation does not work on the hurd
Date: Wed, 9 Apr 2025 12:29:09 +0200 (CEST)
After mentioning this on IRC Ludovic pushed 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 to the shepherd repo.

I wanted to try this out and reconfigured using the shepherd from this commit as pid1 in the vm (a bit tricky because of help2man).

The first connection still fails in the same way.unexpected build daemon error: reading from file: Resource temporarily unavailable

A client mentions:
guix build: error: corrupt input while restoring archive from #<closed: file 2396ea8>

However subsequent connections work.




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Tue, 15 Apr 2025 16:09:02 GMT) Full text and rfc822 format available.

Message #11 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 77610 <at> debbugs.gnu.org,  yelninei <at> tutamail.com
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Tue, 15 Apr 2025 18:07:43 +0200
[Message part 1 (text/plain, inline)]
yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> After mentioning this on IRC Ludovic pushed 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 to the shepherd repo.
>
> I wanted to try this out and reconfigured using the shepherd from this commit as pid1 in the vm (a bit tricky because of help2man).
>
> The first connection still fails in the same way.unexpected build daemon error: reading from file: Resource temporarily unavailable

I looked a bit into this, and I think shepherd is doing the right
working as expected, making the socket blocking before executing
guix-daemon (it’s clear when stracing it on Linux).

So there must be something specific at play on the Hurd.

I tried this snippet (server on one side, client on the other side) and
it works as expected: ‘accept’ blocks and subsequent read does not get
EAGAIN.

So I’m at loss here.  Does ‘tests/systemd.sh’ succeed when ran natively?
(In particular the check added in
8d31cafbdcb818160852a5d1e6fc24c1a9c53e41.)

Thanks,
Ludo’.

[non-blocking-hurd.scm (text/plain, inline)]
(use-modules (ice-9 match))

(define (blocking-port port)
  "Return PORT after putting it in non-blocking mode."
  (let ((flags (fcntl port F_GETFL)))
    (fcntl port F_SETFL (logand (lognot O_NONBLOCK) flags))
    port))

(let ((sock (socket AF_UNIX (logior SOCK_STREAM SOCK_NONBLOCK) 0)))
  (bind sock AF_UNIX "/tmp/sock")
  (listen sock 10)
  (match (pk 'x (accept (blocking-port sock) SOCK_CLOEXEC)) ;should block
    ((port . _)
     (pk 'read (read port)))))

;; Client:
(let ((sock (socket AF_UNIX (logior SOCK_STREAM) 0)))
  (connect sock AF_UNIX "/tmp/sock")
  (display "hi!\n" sock))

Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 16 Apr 2025 18:09:07 GMT) Full text and rfc822 format available.

Message #14 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 16 Apr 2025 20:08:14 +0200 (CEST)
Hello,

Apr 15, 2025, 16:08 by ludo <at> gnu.org:

> yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:
>
>> After mentioning this on IRC Ludovic pushed 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 to the shepherd repo.
>>
>> I wanted to try this out and reconfigured using the shepherd from this commit as pid1 in the vm (a bit tricky because of help2man).
>>
>> The first connection still fails in the same way.unexpected build daemon error: reading from file: Resource temporarily unavailable
>>
>
> I looked a bit into this, and I think shepherd is doing the right
> working as expected, making the socket blocking before executing
> guix-daemon (it’s clear when stracing it on Linux).
>
> So there must be something specific at play on the Hurd.
>
> I tried this snippet (server on one side, client on the other side) and
> it works as expected: ‘accept’ blocks and subsequent read does not get
> EAGAIN.
>
> So I’m at loss here.  Does ‘tests/systemd.sh’ succeed when ran natively?
> (In particular the check added in
> 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41.)
>

Yes, it is passing both on 1.0.3 and 1.0.4. The only thing failing now is the system-log test.
As before when using #:lazy-start #f it works as expected which makes the only difference the timing of the first connection. What would the most minimal guix-daemon client need to look like to trigger the EAGAIN
 
I tried to verify that the port is definitly blocking before being passed to guix-daemon and it is. I am very confused.

Do you know of other processes (with not a lot of dependencies) that can be socket activated to try to replicate this with something less complicated than guix-daemon?



> Thanks,
> Ludo’.
>
> (use-modules (ice-9 match))
>
> (define (blocking-port port)
> "Return PORT after putting it in non-blocking mode."
> (let ((flags (fcntl port F_GETFL)))
> (fcntl port F_SETFL (logand (lognot O_NONBLOCK) flags))
> port))
>
> (let ((sock (socket AF_UNIX (logior SOCK_STREAM SOCK_NONBLOCK) 0)))
> (bind sock AF_UNIX "/tmp/sock")
> (listen sock 10)
> (match (pk 'x (accept (blocking-port sock) SOCK_CLOEXEC)) ;should block
> ((port . _)
> (pk 'read (read port)))))
>
> ;; Client:
> (let ((sock (socket AF_UNIX (logior SOCK_STREAM) 0)))
> (connect sock AF_UNIX "/tmp/sock")
> (display "hi!\n" sock))
>





Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 16 Apr 2025 20:20:02 GMT) Full text and rfc822 format available.

Message #17 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 16 Apr 2025 22:19:17 +0200
Hi,

yelninei <at> tutamail.com writes:

>> So I’m at loss here.  Does ‘tests/systemd.sh’ succeed when ran natively?
>> (In particular the check added in
>> 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41.)
>>
>
> Yes, it is passing both on 1.0.3 and 1.0.4. The only thing failing now is the system-log test.

Intriguing.

> As before when using #:lazy-start #f it works as expected which makes
> the only difference the timing of the first connection. What would the
> most minimal guix-daemon client need to look like to trigger the
> EAGAIN
>  
> I tried to verify that the port is definitly blocking before being passed to guix-daemon and it is. I am very confused.
>
> Do you know of other processes (with not a lot of dependencies) that can be socket activated to try to replicate this with something less complicated than guix-daemon?

Well there’s ‘guix publish’, and otherwise the examples from
‘tests/systemd.sh’ (following ‘define %command’).

Otherwise we could mimic it by writing a C program that that opens a
SOCK_NONBLOCK socket, binds + listens + select(2) until something
happens, then calls fcntl(2) to clear the O_NONBLOCK flag, and then
forks + execs and call accept(2) in the child process.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Fri, 18 Apr 2025 08:23:08 GMT) Full text and rfc822 format available.

Message #20 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Fri, 18 Apr 2025 10:21:20 +0200 (GMT+02:00)
Hello,

Apr 16, 2025, 20:19 by ludo <at> gnu.org:

> Well there’s ‘guix publish’, and otherwise the examples from
> ‘tests/systemd.sh’ (following ‘define %command’).
>
> Otherwise we could mimic it by writing a C program that that opens a
> SOCK_NONBLOCK socket, binds + listens + select(2) until something
> happens, then calls fcntl(2) to clear the O_NONBLOCK flag, and then
> forks + execs and call accept(2) in the child process.
>
> Ludo’.
>
I tested guix-publish and that had no issues.

Some checks I did yesterday with guix-dameon:
- Shepherd is passing a blocking socket
- The "fdSocket" in "acceptConnection" is always blocking.
- the "remote" socket in "acceptConnection" is O_NONBLOCK on the first connection only.
- Then also the "from.fd" socket in  "processConnection" is O_NONBLOCK on the first connectionThis then causes EAGAIN on trying to read the clientVersion.

On linux none of this is an issue.
Adding the same check as for the fd 3 socket  for O_NONBLOCK to the "connection" socket after accept  to tests/systemd.sh passes on Linux but causes a failure on the Hurd.

Is glibc accept doing something weird?
I am struggling to understand how the first connection would be any different than subsequent ones (and only in the #:lazy-start? #t case)

I am unsure what to do about this because shepherd seems to do everything correctly. I saw that ci.g.g.o has started to build i586-gnu substitutes (in particular gcc-final) but if you are restarting the builders more aggressively now then each first build will fail because of this and idk if cuirass can reschedule builds on such failures.

Maybe the easiest is to to expose the #:lazy-start? option for now and disable it for guix-daemon in %base-services/hurd ?











Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Fri, 18 Apr 2025 09:43:05 GMT) Full text and rfc822 format available.

Message #23 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Fri, 18 Apr 2025 11:42:17 +0200
Hi,

yelninei <at> tutamail.com writes:

> I tested guix-publish and that had no issues.

You mean the first ‘wget -O …’ passes?

> Some checks I did yesterday with guix-dameon:
> - Shepherd is passing a blocking socket
> - The "fdSocket" in "acceptConnection" is always blocking.
> - the "remote" socket in "acceptConnection" is O_NONBLOCK on the first connection only.

Looking at ‘accept4.c’ in libc, the only way ‘remote’ can be O_NONBLOCK
is if:

  1. ‘accept4’ is passed SOCK_NONBLOCK, but that’s not the case here
     (see ‘accept.c’);

  2. ‘__socket_accept’ returns a O_NONBLOCK socket, which would be a bug
     in the server, pflocal.

At first sight ‘S_io_set_all_openmodes’ in pflocal does the job and
‘S_socket_accept’ honors those flags.

> Adding the same check as for the fd 3 socket  for O_NONBLOCK to the
> "connection" socket after accept  to tests/systemd.sh passes on Linux
> but causes a failure on the Hurd.

So we have a reproducer.

Could you pass it on to bug-hurd? :-)  It may be easier if the whole
thing is in C.

> I am unsure what to do about this because shepherd seems to do
> everything correctly. I saw that ci.g.g.o has started to build
> i586-gnu substitutes (in particular gcc-final) but if you are
> restarting the builders more aggressively now then each first build
> will fail because of this and idk if cuirass can reschedule builds on
> such failures.

Yeah, it’s not great.  Those will have to be restarted manually I’m
afraid, but most of the time anybody can click on the “Restart” button
in Cuirass.

> Maybe the easiest is to to expose the #:lazy-start? option for now and disable it for guix-daemon in %base-services/hurd ?

Hmm maybe.  Let’s first figure out if this is Hurd bug.

Thanks for investigating!

Ludo’.




This bug report was last modified 15 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.