GNU bug report logs - #65419
[Shepherd] Non-responding service control fiber

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Mon, 21 Aug 2023 09:39:02 UTC

Severity: important

Merged with 65178

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 65419 in the body.
You can then email your comments to 65419 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to mail <at> cbaines.net, bug-guix <at> gnu.org:
bug#65419; Package guix. (Mon, 21 Aug 2023 09:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to mail <at> cbaines.net, bug-guix <at> gnu.org. (Mon, 21 Aug 2023 09:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: bug-guix <at> gnu.org
Subject: [Shepherd] Non-reponding service control fiber
Date: Mon, 21 Aug 2023 11:38:12 +0200
Hello,

On milano-guix-1 (a build machine behind bayfront, running shepherd
0.10.2), ‘herd status’ and ‘herd status guix-build-coordinator-agent’
would hang (there’s no ‘guix-build-coordinator’ process running).

‘herd stop childhurd2’ hangs and has no effect.

Conversely, ‘herd status nscd’ and similar for most other services works
fine.  When terminating a service’s process, the service gets respawned
just fine.

The conclusion seems to be that the control fiber of the ‘root’ service
is not responding: it is blocked on a get/put? did it exit?

Unfortunately we don’t have data from the logs that would give clues as
to what went wrong.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#65419; Package guix. (Wed, 23 Aug 2023 08:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 65419 <at> debbugs.gnu.org (full text, mbox):

From: Giovanni Biscuolo <g <at> xelera.eu>
To: Ludovic Courtès <ludovic.courtes <at> inria.fr>,
 65419 <at> debbugs.gnu.org
Cc: Christopher Baines <mail <at> cbaines.net>
Subject: Re: bug#65419: [Shepherd] Non-reponding service control fiber
Date: Wed, 23 Aug 2023 10:00:15 +0200
[Message part 1 (text/plain, inline)]
Hello,

Ludovic Courtès <ludovic.courtes <at> inria.fr> writes:

[...]

> The conclusion seems to be that the control fiber of the ‘root’ service
> is not responding: it is blocked on a get/put? did it exit?
>
> Unfortunately we don’t have data from the logs that would give clues as
> to what went wrong.

I've had a look at /var/log/messages but nothing seems wrong except
messages like this one:

--8<---------------cut here---------------start------------->8---

Aug 21 14:48:42 localhost shepherd[1]: 6 connections still in use after sshd-13752 termination. 
Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 (PID 29977) exited with 255. 
Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 has been disabled. 
Aug 21 14:48:42 localhost shepherd[1]: Transient service sshd-13752 terminated, now unregistered. 

--8<---------------cut here---------------end--------------->8---

Is it useful configuring the monitoring service [1] on milano-guix-1 to
have useful data in the logs in case we get a similar issue?

Thanks, Gio'


[1] https://www.gnu.org/software/shepherd/manual/shepherd.html#Monitoring-Service

-- 
Giovanni Biscuolo

Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#65419; Package guix. (Thu, 24 Aug 2023 08:10:02 GMT) Full text and rfc822 format available.

Message #11 received at 65419 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: Giovanni Biscuolo <g <at> xelera.eu>
Cc: 65419 <at> debbugs.gnu.org, Christopher Baines <mail <at> cbaines.net>
Subject: Re: bug#65419: [Shepherd] Non-reponding service control fiber
Date: Thu, 24 Aug 2023 10:09:02 +0200
Hi,

Giovanni Biscuolo <g <at> xelera.eu> skribis:

> I've had a look at /var/log/messages but nothing seems wrong except
> messages like this one:
>
>
> Aug 21 14:48:42 localhost shepherd[1]: 6 connections still in use after sshd-13752 termination. 
> Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 (PID 29977) exited with 255. 
> Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 has been disabled. 
> Aug 21 14:48:42 localhost shepherd[1]: Transient service sshd-13752 terminated, now unregistered. 

Yeah, I think it happened earlier but unfortunately the previously logs
got deleted (rottlog is not behaving as expected).

> Is it useful configuring the monitoring service [1] on milano-guix-1 to
> have useful data in the logs in case we get a similar issue?

It wouldn’t help in this case, but it’s still interesting to have it
around.

  sudo herd eval root '(begin (use-modules (shepherd service monitoring)) (register-services (list (monitoring-service))))'
  sudo herd start monitoring

Ludo’.




Merged 65178 65419. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 03 Sep 2023 20:00:03 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 03 Sep 2023 20:00:03 GMT) Full text and rfc822 format available.

Changed bug title to '[Shepherd] Non-responding service control fiber' from '[Shepherd] Non-reponding service control fiber' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 23 Nov 2023 20:43:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#65419; Package guix. (Tue, 19 Dec 2023 23:01:04 GMT) Full text and rfc822 format available.

Message #20 received at 65419 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Attila Lendvai <attila <at> lendvai.name>
Cc: 65419 <at> debbugs.gnu.org, 67538 <at> debbugs.gnu.org, 67230 <at> debbugs.gnu.org,
 65178 <at> debbugs.gnu.org, Timo Wilken <guix <at> twilken.net>
Subject: Re: bug#65419: [Shepherd] Non-responding service control fiber
Date: Wed, 20 Dec 2023 00:00:36 +0100
Hello,

Attila Lendvai <attila <at> lendvai.name> skribis:

> i think i have found the root cause of this, as documented here: https://issues.guix.gnu.org/67839
>
> that issue contains patches for shepherd to reproduce it in its test suite.

Yes, it looks like this long-standing and hard-to-debug issue may well
be fixed now, thumbs up Attila!!

We have accumulated quite a few fixes by now so I think I’ll release
0.10.3 hopefully in 2023 and otherwise soon after.

Thanks,
Ludo’.




bug closed, send any further explanations to 65419 <at> debbugs.gnu.org and Ludovic Courtès <ludovic.courtes <at> inria.fr> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 02 Jan 2024 22:11:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 31 Jan 2024 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.