GNU bug report logs - #64653
‘static-networking’ fails to start

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Sat, 15 Jul 2023 20:06:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 64653 in the body.
You can then email your comments to 64653 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Sat, 15 Jul 2023 20:06:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 15 Jul 2023 20:06:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: bug-guix <at> gnu.org
Subject: ‘static-networking’ fails to start
Date: Sat, 15 Jul 2023 22:04:59 +0200
Hi!

On the machine that exhibited <https://issues.guix.gnu.org/63516>, I’m
now seeing this, with the fix from commit
26602f4063a6e0c626e8deb3423166bcd0abeb90:

--8<---------------cut here---------------start------------->8---
[  121.017522] shepherd[1]: Starting service user-homes...
[  121.049038] tg3 0000:05:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address b8:cb:29:b5:1c:3a
[  121.049042] tg3 0000:05:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[  121.049044] tg3 0000:05:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[  121.049045] tg3 0000:05:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[  121.084342] tg3 0000:05:00.1 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address b8:cb:29:b5:1c:3b
[  121.084355] tg3 0000:05:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[  121.084363] tg3 0000:05:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[  121.084370] tg3 0000:05:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[  121.102367] iTCO_vendor_support: vendor-support=0
[  121.103831] Error: Driver 'pcspkr' is already registered, aborting...
[  121.108617] dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.4)
[  121.113037] tg3 0000:05:00.1 eno2: renamed from eth1

[...]

[  121.281600] shepherd[1]: Service user-homes has been started.
[  121.282538] shepherd[1]: Service user-homes started.
[  121.368316] ipmi_si IPI0001:00: Using irq 10
[  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
[  121.419871] shepherd[1]: Exception caught while starting #<<service> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 7f1981887000>) (#<closed: file 7f1981887000>))
[  121.420074] shepherd[1]: Service user-homes running with value #t.
[  121.420218] shepherd[1]: Service networking failed to start.
--8<---------------cut here---------------end--------------->8---

The failure seems to happen after the whole static networking config has
been set up though (‘ip a’ shows that everything’s in place).

Problem is that at this point ‘networking’ cannot be started unless you
manually tear down everything with ‘ip’:

--8<---------------cut here---------------start------------->8---
$ sudo herd start networking
herd: error: exception rattrapée pendant l’exécution de « start » sur le service « networking » :
Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.
--8<---------------cut here---------------end--------------->8---

(17 = EEXIST)

This makes me think we should make the set up phase idempotent or,
alternatively, add special actions to force a change.

Thoughts?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Sun, 17 Sep 2023 16:44:01 GMT) Full text and rfc822 format available.

Message #8 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Matt Wette <matt.wette <at> gmail.com>
To: 64653 <at> debbugs.gnu.org
Subject: stopping ntp and dnsmasq
Date: Sun, 17 Sep 2023 09:42:45 -0700
Are there any workarounds for this.   I've been digging into anything to 
help.
I'm dead in the water trying to get ntpd and tftpd (dnsmasq) working.  
They require this.
Or, is there a way to get dnsmasq working itself?

Matt





Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Sun, 17 Sep 2023 17:10:01 GMT) Full text and rfc822 format available.

Message #11 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Matt Wette <matt.wette <at> gmail.com>
To: 64653 <at> debbugs.gnu.org
Subject: Re: stopping ntp and dnsmasq
Date: Sun, 17 Sep 2023 10:09:03 -0700
On 9/17/23 9:42 AM, Matt Wette wrote:
> Are there any workarounds for this.   I've been digging into anything 
> to help.
> I'm dead in the water trying to get ntpd and tftpd (dnsmasq) working.  
> They require this.
> Or, is there a way to get dnsmasq working itself?

I see there is atftp, so I'll try that.   Still no working ntpd.




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 02 Oct 2023 10:25:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Mon, 02 Oct 2023 12:00:02 GMT) Full text and rfc822 format available.

Message #16 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 64653 <at> debbugs.gnu.org
Cc: Christopher Baines <guix <at> cbaines.net>, Matt Wette <matt.wette <at> gmail.com>
Subject: Re: bug#64653: ‘static-networking’
 fails to start
Date: Mon, 02 Oct 2023 13:59:13 +0200
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> [  121.281600] shepherd[1]: Service user-homes has been started.
> [  121.282538] shepherd[1]: Service user-homes started.
> [  121.368316] ipmi_si IPI0001:00: Using irq 10
> [  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
> [  121.419871] shepherd[1]: Exception caught while starting #<<service> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 7f1981887000>) (#<closed: file 7f1981887000>))
> [  121.420074] shepherd[1]: Service user-homes running with value #t.
> [  121.420218] shepherd[1]: Service networking failed to start.
>
>
> The failure seems to happen after the whole static networking config has
> been set up though (‘ip a’ shows that everything’s in place).
>
> Problem is that at this point ‘networking’ cannot be started unless you
> manually tear down everything with ‘ip’:
>
> $ sudo herd start networking
> herd: error: exception rattrapée pendant l’exécution de « start » sur le service « networking » :
> Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.

Quick workaround if you encounter this bug:

  1. Find the “tear-down” script of your system with:

       guix gc -R /run/current-system |grep tear-down-network

  2. In a ‘screen’ session, run this as root:

       while true ; do herd enable networking; herd start networking; sleep 3; done

  3. Run:

       sudo guile --no-auto-compile TEAR_DOWN_SCRIPT_FROM_STEP_1

Beautiful, isn’t it?

(We’ll actually work on fixing the bug, too…)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Sat, 11 Nov 2023 16:27:02 GMT) Full text and rfc822 format available.

Message #19 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Leo Nikkilä <hello <at> lnikki.la>
To: 64653 <at> debbugs.gnu.org
Subject: Re: bug#64653: ‘static-networking’ fails to start
Date: Sat, 11 Nov 2023 16:25:42 +0000
I'm also seeing this issue on a headless RockPro64 system. Do you know anything I could change in the configuration to work around this during boot, e.g. patch a specific commit out?

Happy to provide further details or test things on my system.




Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Wed, 03 Jan 2024 23:44:02 GMT) Full text and rfc822 format available.

Message #22 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 64653 <at> debbugs.gnu.org
Subject: Re: bug#64653: ‘static-networking’
 fails to start
Date: Thu, 04 Jan 2024 00:42:51 +0100
Hello!

Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> [  121.282538] shepherd[1]: Service user-homes started.
> [  121.368316] ipmi_si IPI0001:00: Using irq 10
> [  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
> [  121.419871] shepherd[1]: Exception caught while starting #<<service> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 7f1981887000>) (#<closed: file 7f1981887000>))
> [  121.420074] shepherd[1]: Service user-homes running with value #t.
> [  121.420218] shepherd[1]: Service networking failed to start.

I’m seeing a similar exception in a Hurd VM running shepherd 0.10.3rc1:

--8<---------------cut here---------------start------------->8---
Jan  3 23:13:22 localhost shepherd[1]: Exception caught while starting networking: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 207e498>) (#<closed: file 207e498>)) 
Jan  3 23:13:22 localhost shepherd[1]: Service networking failed to start. 
--8<---------------cut here---------------end--------------->8---

It’s interesting because it suggests that the offending ‘port-filename’
call comes from ‘load’, not from the network-setup code being loaded
(here, the /hurd/pfinet translator has been properly set up).

Looking at the code in ‘boot-9.scm’, I *think* we end up calling
‘primitive-load’; ‘shepherd’ replaces it with its own (@ (shepherd
support) primitive-load*).

I managed to grab this backtrace:

--8<---------------cut here---------------start------------->8---
Evaluating user expression (catch #t (lambda () (load "/gnu/store/64?")) # ?).
starting '/gnu/store/gn8q7p790a9zdnlciyp1vlncpin366r0-hurd-v0.9.git20230216/hurd/pfinet "--ipv6" "/servers/socket/26" "--interface" "/dev/eth0" "--address" "10.0.2.15" "--netmask" "255.255.255.0" "--gateway" "10.0.2.2"'
In ice-9/boot-9.scm:
    142:2  7 (dynamic-wind #<procedure 20393a0 at ice-9/eval.scm:33?> ?)
In shepherd/support.scm:
   486:15  6 (_ #<closed: file 50a7e38>)
In ice-9/read.scm:
   859:19  5 (read _)
In unknown file:
           4 (port-filename #<closed: file 50a7e38>)
In ice-9/boot-9.scm:
  1685:16  3 (raise-exception _ #:continuable? _)
  1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail?>)
In ice-9/eval.scm:
    159:9  1 (_ #(#(#<module (#{ g171}#) 3cd25f0>) (# "port-fil?" ?)))
In unknown file:
           0 (make-stack #t)
#t
--8<---------------cut here---------------end--------------->8---

So it’s indeed ‘read’ as called from ‘primitive-load*’ that stumbles
upon a closed port.  It also happens when loading a file that simply
suspends the current fiber via ‘sleep’ or similar, but only on the Hurd
though.

To be continued…

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Fri, 05 Jan 2024 16:33:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
bug acknowledged by developer. (Fri, 05 Jan 2024 16:33:02 GMT) Full text and rfc822 format available.

Message #27 received at 64653-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 64653-done <at> debbugs.gnu.org
Subject: Re: bug#64653: ‘static-networking’
 fails to start
Date: Fri, 05 Jan 2024 17:32:18 +0100
Hi!

Ludovic Courtès <ludo <at> gnu.org> skribis:

> Evaluating user expression (catch #t (lambda () (load "/gnu/store/64?")) # ?).
> starting '/gnu/store/gn8q7p790a9zdnlciyp1vlncpin366r0-hurd-v0.9.git20230216/hurd/pfinet "--ipv6" "/servers/socket/26" "--interface" "/dev/eth0" "--address" "10.0.2.15" "--netmask" "255.255.255.0" "--gateway" "10.0.2.2"'
> In ice-9/boot-9.scm:
>     142:2  7 (dynamic-wind #<procedure 20393a0 at ice-9/eval.scm:33?> ?)
> In shepherd/support.scm:
>    486:15  6 (_ #<closed: file 50a7e38>)
> In ice-9/read.scm:
>    859:19  5 (read _)
> In unknown file:
>            4 (port-filename #<closed: file 50a7e38>)
> In ice-9/boot-9.scm:
>   1685:16  3 (raise-exception _ #:continuable? _)
>   1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail?>)
> In ice-9/eval.scm:
>     159:9  1 (_ #(#(#<module (#{ g171}#) 3cd25f0>) (# "port-fil?" ?)))
> In unknown file:
>            0 (make-stack #t)
> #t
>
> So it’s indeed ‘read’ as called from ‘primitive-load*’ that stumbles
> upon a closed port.

Good news: this is fixed by 4e431fda5f2ec76b6d6a271be7c30b1324431329!
Silly me had introduced a ‘dynamic-wind’ there.

(The funny thing with extensible systems like the Shepherd is that the
problem can be anywhere.  :-))

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Sat, 20 Jan 2024 21:15:02 GMT) Full text and rfc822 format available.

Message #30 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Matt Wette <matt.wette <at> gmail.com>
To: 64653 <at> debbugs.gnu.org
Subject: works now
Date: Sat, 20 Jan 2024 13:14:12 -0800
This bug no longer occurs on my system.   That change occurred over the 
last week.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 18 Feb 2024 12:24:15 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Felix Lechner <felix.lechner <at> lease-up.com> to control <at> debbugs.gnu.org. (Mon, 25 Mar 2024 15:37:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Mon, 25 Mar 2024 15:54:01 GMT) Full text and rfc822 format available.

Message #37 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Fabio Natali <me <at> fabionatali.com>
To: 64653 <at> debbugs.gnu.org
Subject: 'static-networking' fails to start
Date: Mon, 25 Mar 2024 11:52:12 +0000
Hi,

I've been trying to reconfigure a machine from static IPv4 to static
dual-stack or IPv6-only. I followed one⁰ of the examples in the manual,
so I'd think I got the syntax right.

Once the reconfiguration has taken place and when restarting the
networking service, I get this error:

,----
| herd: error: exception caught while executing 'start' on service 'networking':
| Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.
`----

This would seem to be relevant to this bug report 64653?

Do you know what this might be related to and what I can do to solve it?

This happens on an up-to-date Guix system.

Thanks, best wishes, Fabio.

⁰ https://guix.gnu.org/manual/devel/en/html_node/Networking-Setup.html#index-static_002dnetworking


-- 
Fabio Natali
https://fabionatali.com




Information forwarded to bug-guix <at> gnu.org:
bug#64653; Package guix. (Mon, 25 Mar 2024 18:44:02 GMT) Full text and rfc822 format available.

Message #40 received at 64653 <at> debbugs.gnu.org (full text, mbox):

From: Fabio Natali <me <at> fabionatali.com>
To: 64653 <at> debbugs.gnu.org
Subject: Re: 'static-networking' fails to start
Date: Mon, 25 Mar 2024 18:43:13 +0000
On 2024-03-25, 11:52 +0000, Fabio Natali <me <at> fabionatali.com> wrote:
> Once the reconfiguration has taken place and when restarting the
> networking service, I get this error:
>
> ,----
> | herd: error: exception caught while executing 'start' on service 'networking':
> | Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.
> `----

Ok, good news, thanks to Felix's advice[0] I was able to get this
sorted!

Apparently, specifying a default IPv6 gateway (as a link local address)
is what was causing the issue for me. Once the following bit was
commented out, everything started working again.

,----
| (static-networking
|  (addresses (list (network-address
|                    (device "eth0")
|                    (value "10.0.0.2/24"))
|                   (network-address
|                    (device "eth0")
|                    (value "2001:db8::1/64"))))
|  (routes (list (network-route
|                 (destination "default")
|                 (gateway "10.0.0.1"))))
| ;;                (network-route
| ;;                 (destination "default")
| ;;                 (gateway "fe80::"))))
|  (name-servers '("10.0.0.1" "2001:db8::")))
`----

("fe80::" and "2001:db8::" are just placeholders.)

I assume the router address gets retrieved automatically via Router
Advertisment (RA), so no need for that in my case.

Still, I'd expect to be possible to indicate the router's link-local
address... Do you see a possible bug here or is there anything else that
I might be missing?

Thanks, cheers, Fabio.


[0] https://lists.gnu.org/archive/html/help-guix/2024-03/msg00132.html


-- 
Fabio Natali
https://fabionatali.com




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 23 Apr 2024 11:24:54 GMT) Full text and rfc822 format available.

This bug report was last modified 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.