GNU bug report logs - #77992
Childhurd stuck booting

Previous Next

Package: guix;

Reported by: yelninei <at> tutamail.com

Date: Tue, 22 Apr 2025 15:46:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

To reply to this bug, email your comments to 77992 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#77992; Package guix. (Tue, 22 Apr 2025 15:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to yelninei <at> tutamail.com:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Tue, 22 Apr 2025 15:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: bug-guix <at> gnu.org
Subject: Childhurd stuck booting
Date: Tue, 22 Apr 2025 17:45:17 +0200 (GMT+02:00)
Hello,

After I updated today my hurd-vm is stuck booting.
Looking at the output it seems it is receiving the secrets from the host correctly but then nothing happens after "Service secret-service-client has been started"
Not Working: c6ee7b0f79632d50ad491b75c240547be8f40c31
Working: 54cc9c96ec0877b2afa24871c3acd8af27b0d500

Because python does not support cross compiling for most of the window this is hard to bisect. I didn't find time yet to look into it more.

Yelninei




Information forwarded to bug-guix <at> gnu.org:
bug#77992; Package guix. (Tue, 22 Apr 2025 20:24:01 GMT) Full text and rfc822 format available.

Message #8 received at 77992 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: 77992 <at> debbugs.gnu.org
Subject: Childhurd stuck booting
Date: Tue, 22 Apr 2025 22:23:31 +0200 (GMT+02:00)
Reverting da741d89310efd0530351670d9c55ec2f952ab98 "services: account: Create /var/guix/profiles/per-user/$USER." fixes this, but I am not sure why.


Finding this was a lot of trial and error (bisecting did now work because of the python cross compilation failure) but sshd not showing up is caught by the childhurd system test. Encountering a record ABI mismatch requiring a recompile of the entire guix tree slowed this down as well.

Also https://issues.guix.gnu.org/77610 is causing the the rest of the failures in the chldhurd  system test which expect the guix daemon to be avaialble immediately. I started looking around in glibc and hurd but I haven't found a good setup yet to easily try changes without a full rebuild.




Information forwarded to bug-guix <at> gnu.org:
bug#77992; Package guix. (Tue, 22 Apr 2025 22:18:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org>
Cc: 77992 <at> debbugs.gnu.org, yelninei <at> tutamail.com
Subject: Re: bug#77992: Childhurd stuck booting
Date: Wed, 23 Apr 2025 00:17:12 +0200
[Message part 1 (text/plain, inline)]
Hi,

yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> Reverting da741d89310efd0530351670d9c55ec2f952ab98 "services: account: Create /var/guix/profiles/per-user/$USER." fixes this, but I am not sure why.

Woow, thanks for bisecting this, I would never had thought this could be
a problem.

I built the image for ‘bare-hurd.tmpl’ and booted it (with
“console=com1” on the Mach command line) and here’s what we see:

--8<---------------cut here---------------start------------->8---
shepherd[1]: Starting service file-systems...
shepherd[1]: Service file-systems started.
shepherd[1]: Service file-systems running with value #t.
shepherd[1]: Service file-systems has been started.
shepherd[1]: Starting service user-homes...
shepherd[1]: Service user-homes failed to start.
shepherd[1]: Exception caught while starting user-homes: (misc-error "scm_fdes_to_port" "requested file mode not available on fdes" () #f)
shepherd[1]: Service loopback has been started.
shepherd[1]: Service loopback started.
shepherd[1]: Service loopback running with value #t.
--8<---------------cut here---------------end--------------->8---

The ‘user-homes’ service fails to start, so basically the system isn’t
brought up.

The culprit appears to be ‘mkdir-p/perms’:

--8<---------------cut here---------------start------------->8---
ludo <at> childhurd ~$ rpctrace -o log guile -c '(use-modules (gnu build activation)) (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #o755)'
Backtrace:
In ice-9/boot-9.scm:
  1752:10  7 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
           6 (apply-smob/0 #<thunk 20f91a0>)
In ice-9/boot-9.scm:
    724:2  5 (call-with-prompt _ _ #<procedure default-prompt-handle?>)
In ice-9/eval.scm:
    619:8  4 (_ #(#(#<directory (guile-user) 20ec6e0>)))
In ice-9/command-line.scm:
   185:19  3 (_ #<input: string 2106fc0>)
In unknown file:
           2 (eval (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #) #)
In gnu/build/activation.scm:
    97:20  1 (mkdir-p/perms _ #("ludo" "x" 1000 998 "Ludovic Cou?" ?) ?)
In unknown file:
           0 (open "." 7340032 #<undefined>)

ERROR: In procedure open:
In procedure scm_fdes_to_port: requested file mode not available on fdes
--8<---------------cut here---------------end--------------->8---

The relevant log snippet is this:

--8<---------------cut here---------------start------------->8---
  17<--33(pid168)->dir_lookup ("etc/passwd" 4194305 0) = 0 1 ""    66<--74(pid168)
  66<--74(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID) 
  66<--74(pid168)->io_stat_request () = 0 {23 7 0 56029 0 1745320104 0 33188 1 0 0 1841 0 17453
19370 840000000 1745319369 220000000 1745319369 220000000 8192 8 0 0 0 0 0 0 0 0 0 0 0}
  66<--74(pid168)->io_seek_request (0 0) = 0 0
  66<--74(pid168)->io_read_request (-1 8192) = 0 "root:x:0:0:System administrator:/root:/gnu/st
ore/a1vynvd381hxsf979qzv8r25bc3pd2r"
task13(pid168)-> 3206 (pn{ 30}) = 0 
  20<--32(pid168)->dir_lookup ("." 7340160 0) = 0 1 ""    66<--70(pid168)
  66<--70(pid168)->io_stat_request () = 0 {23 7 0 264001 0 1745320625 0 16832 3 1000 998 4096 0
 1745342831 30000000 1745342821 950000000 1745319372 110000000 8192 8 0 0 0 8388736 8388736 838
8736 8388736 8388736 8388736 8388736 8388736}
  66<--70(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID) 
  66<--70(pid168)->io_get_openmodes_request () = 0 0
  25<--37(pid168)->io_write_request ("Backtrace:\n" -1) = 0 11
--8<---------------cut here---------------end--------------->8---

The ‘io_get_openmodes’ RPC corresponds to F_GETFL in
‘scm_i_fdes_is_valid’ in Guile.

Can be reproduced with just this:

  guile -c '(open "." O_DIRECTORY)'

I think ‘flags_to_mode’ in Guile returns “r” on Linux, which is fine
because O_RDONLY is set.  But on the Hurd, O_RDONLY is not set:

--8<---------------cut here---------------start------------->8---
ludo <at> childhurd ~$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'

;;; ("i586-pc-gnu" 0)
--8<---------------cut here---------------end--------------->8---

vs.:

--8<---------------cut here---------------start------------->8---
$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'

;;; ("x86_64-unknown-linux-gnu" 98304)
--8<---------------cut here---------------end--------------->8---

Long story short, O_RDONLY = 0 on Linux but it’s non-zero on the Hurd,
so to placate ‘scm_i_fdes_is_valid’, we need to show it that the
directory is opened with O_RDONLY:

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/build/activation.scm b/gnu/build/activation.scm
index 11f7c82d67..038d8327de 100644
--- a/gnu/build/activation.scm
+++ b/gnu/build/activation.scm
@@ -90,6 +90,7 @@ (define (mkdir-p/perms directory owner bits)
   ;; By combining O_NOFOLLOW and O_DIRECTORY, this procedure automatically
   ;; verifies that no components are symlinks.
   (define open-flags (logior O_CLOEXEC ; don't pass the port on to subprocesses
+                             O_RDONLY  ;need on the Hurd, harmless on Linux
                              O_NOFOLLOW ; don't follow symlinks
                              O_DIRECTORY)) ; reject anything not a directory
 
[Message part 3 (text/plain, inline)]
Tested on both systems and it seems to work.

Let me know how it goes for you!

> Finding this was a lot of trial and error (bisecting did now work
> because of the python cross compilation failure) but sshd not showing
> up is caught by the childhurd system test. Encountering a record ABI
> mismatch requiring a recompile of the entire guix tree slowed this
> down as well.

For the API mismatch, you could probably rebuild just the small subset
of modules affected by this (for example, those that refer to
<guix-configuration> if that’s what’s involved).

> Also https://issues.guix.gnu.org/77610 is causing the the rest of the
> failures in the chldhurd  system test which expect the guix daemon to
> be avaialble immediately. I started looking around in glibc and hurd
> but I haven't found a good setup yet to easily try changes without a
> full rebuild.

For such things, I found that testing interactively in QEMU is best.

Thanks for finding and debugging this!

Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#77992; Package guix. (Tue, 22 Apr 2025 22:18:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#77992; Package guix. (Wed, 23 Apr 2025 08:36:03 GMT) Full text and rfc822 format available.

Message #17 received at 77992 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77992 <at> debbugs.gnu.org
Subject: Re: bug#77992: Childhurd stuck booting
Date: Wed, 23 Apr 2025 10:34:31 +0200 (GMT+02:00)
Hello

Apr 22, 2025, 22:17 by ludo <at> gnu.org:

>
> Can be reproduced with just this:
>
>  guile -c '(open "." O_DIRECTORY)'
>
> I think ‘flags_to_mode’ in Guile returns “r” on Linux, which is fine
> because O_RDONLY is set.  But on the Hurd, O_RDONLY is not set:
>
> --8<---------------cut here---------------start------------->8---
> ludo <at> childhurd ~$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'
>
> ;;; ("i586-pc-gnu" 0)
> --8<---------------cut here---------------end--------------->8---
>
> vs.:
>
> --8<---------------cut here---------------start------------->8---
> $ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'
>
> ;;; ("x86_64-unknown-linux-gnu" 98304)
> --8<---------------cut here---------------end--------------->8---
>
> Long story short, O_RDONLY = 0 on Linux but it’s non-zero on the Hurd,
> so to placate ‘scm_i_fdes_is_valid’, we need to show it that the
> directory is opened with O_RDONLY:
>
>
> Tested on both systems and it seems to work.
>
> Let me know how it goes for you!
>
Thank you, this fixes the issue for me and I did not notice a difference on Linux.



>> Finding this was a lot of trial and error (bisecting did now work
>> because of the python cross compilation failure) but sshd not showing
>> up is caught by the childhurd system test. Encountering a record ABI
>> mismatch requiring a recompile of the entire guix tree slowed this
>> down as well.
>>
>
> For the API mismatch, you could probably rebuild just the small subset
> of modules affected by this (for example, those that refer to
> <guix-configuration> if that’s what’s involved).
>

Yeah, but I was not in the mood to figure out which modules would need to be rebuilt and just threw away all of them, but will keep this in mind. For this my brute force method was sufficient but it would have been a lot faster with some more tricks.
> Thanks for finding and debugging this!
>
I guess I am finding all the weird Hurd bugs lately.
>
> Ludo’.
>





Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Wed, 23 Apr 2025 10:33:18 GMT) Full text and rfc822 format available.

Notification sent to yelninei <at> tutamail.com:
bug acknowledged by developer. (Wed, 23 Apr 2025 10:33:18 GMT) Full text and rfc822 format available.

Message #22 received at 77992-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77992-done <at> debbugs.gnu.org
Subject: Re: bug#77992: Childhurd stuck booting
Date: Wed, 23 Apr 2025 12:29:51 +0200
Hi,

yelninei <at> tutamail.com writes:

>> Tested on both systems and it seems to work.
>>
>> Let me know how it goes for you!
>>
> Thank you, this fixes the issue for me and I did not notice a difference on Linux.

Thanks for checking, pushed as 27e62d4481a02f1016c7a72bedb946d92ceecf49.

Ludo’.




This bug report was last modified 1 day ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.