GNU bug report logs -
#65419
[Shepherd] Non-responding service control fiber
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 65419 in the body.
You can then email your comments to 65419 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
mail <at> cbaines.net, bug-guix <at> gnu.org
:
bug#65419
; Package
guix
.
(Mon, 21 Aug 2023 09:39:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ludovic Courtès <ludovic.courtes <at> inria.fr>
:
New bug report received and forwarded. Copy sent to
mail <at> cbaines.net, bug-guix <at> gnu.org
.
(Mon, 21 Aug 2023 09:39:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
On milano-guix-1 (a build machine behind bayfront, running shepherd
0.10.2), ‘herd status’ and ‘herd status guix-build-coordinator-agent’
would hang (there’s no ‘guix-build-coordinator’ process running).
‘herd stop childhurd2’ hangs and has no effect.
Conversely, ‘herd status nscd’ and similar for most other services works
fine. When terminating a service’s process, the service gets respawned
just fine.
The conclusion seems to be that the control fiber of the ‘root’ service
is not responding: it is blocked on a get/put? did it exit?
Unfortunately we don’t have data from the logs that would give clues as
to what went wrong.
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#65419
; Package
guix
.
(Wed, 23 Aug 2023 08:01:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 65419 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
Ludovic Courtès <ludovic.courtes <at> inria.fr> writes:
[...]
> The conclusion seems to be that the control fiber of the ‘root’ service
> is not responding: it is blocked on a get/put? did it exit?
>
> Unfortunately we don’t have data from the logs that would give clues as
> to what went wrong.
I've had a look at /var/log/messages but nothing seems wrong except
messages like this one:
--8<---------------cut here---------------start------------->8---
Aug 21 14:48:42 localhost shepherd[1]: 6 connections still in use after sshd-13752 termination.
Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 (PID 29977) exited with 255.
Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 has been disabled.
Aug 21 14:48:42 localhost shepherd[1]: Transient service sshd-13752 terminated, now unregistered.
--8<---------------cut here---------------end--------------->8---
Is it useful configuring the monitoring service [1] on milano-guix-1 to
have useful data in the logs in case we get a similar issue?
Thanks, Gio'
[1] https://www.gnu.org/software/shepherd/manual/shepherd.html#Monitoring-Service
--
Giovanni Biscuolo
Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#65419
; Package
guix
.
(Thu, 24 Aug 2023 08:10:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 65419 <at> debbugs.gnu.org (full text, mbox):
Hi,
Giovanni Biscuolo <g <at> xelera.eu> skribis:
> I've had a look at /var/log/messages but nothing seems wrong except
> messages like this one:
>
>
> Aug 21 14:48:42 localhost shepherd[1]: 6 connections still in use after sshd-13752 termination.
> Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 (PID 29977) exited with 255.
> Aug 21 14:48:42 localhost shepherd[1]: Service sshd-13752 has been disabled.
> Aug 21 14:48:42 localhost shepherd[1]: Transient service sshd-13752 terminated, now unregistered.
Yeah, I think it happened earlier but unfortunately the previously logs
got deleted (rottlog is not behaving as expected).
> Is it useful configuring the monitoring service [1] on milano-guix-1 to
> have useful data in the logs in case we get a similar issue?
It wouldn’t help in this case, but it’s still interesting to have it
around.
sudo herd eval root '(begin (use-modules (shepherd service monitoring)) (register-services (list (monitoring-service))))'
sudo herd start monitoring
Ludo’.
Merged 65178 65419.
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Sun, 03 Sep 2023 20:00:03 GMT)
Full text and
rfc822 format available.
Severity set to 'important' from 'normal'
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Sun, 03 Sep 2023 20:00:03 GMT)
Full text and
rfc822 format available.
Changed bug title to '[Shepherd] Non-responding service control fiber' from '[Shepherd] Non-reponding service control fiber'
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Thu, 23 Nov 2023 20:43:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#65419
; Package
guix
.
(Tue, 19 Dec 2023 23:01:04 GMT)
Full text and
rfc822 format available.
Message #20 received at 65419 <at> debbugs.gnu.org (full text, mbox):
Hello,
Attila Lendvai <attila <at> lendvai.name> skribis:
> i think i have found the root cause of this, as documented here: https://issues.guix.gnu.org/67839
>
> that issue contains patches for shepherd to reproduce it in its test suite.
Yes, it looks like this long-standing and hard-to-debug issue may well
be fixed now, thumbs up Attila!!
We have accumulated quite a few fixes by now so I think I’ll release
0.10.3 hopefully in 2023 and otherwise soon after.
Thanks,
Ludo’.
bug closed, send any further explanations to
65419 <at> debbugs.gnu.org and Ludovic Courtès <ludovic.courtes <at> inria.fr>
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Tue, 02 Jan 2024 22:11:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 31 Jan 2024 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 99 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.