GNU bug report logs - #58926
Shepherd becomes unresponsive after an interrupt

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Mon, 31 Oct 2022 12:45:01 UTC

Severity: important

Merged with 56674

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58926 in the body.
You can then email your comments to 58926 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#58926; Package guix. (Mon, 31 Oct 2022 12:45:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mathieu Othacehe <othacehe <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 31 Oct 2022 12:45:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: Shepherd becomes unresponsive after an interrupt
Date: Mon, 31 Oct 2022 13:44:46 +0100
Hello,

When running the following command:

--8<---------------cut here---------------start------------->8---
sudo herd restart service-that-hangs-upon-restart
--8<---------------cut here---------------end--------------->8---

then hitting C-c, Shepherd becomes totally unresponsive:

--8<---------------cut here---------------start------------->8---
sudo herd status
--8<---------------cut here---------------end--------------->8---

and all further Shpeherd commands hang forever. I was able to reproduce
it in two different configurations:

1. On my laptop with a Wireguard service trying to reach a non-existing
DNS server.

--8<---------------cut here---------------start------------->8---
            (service wireguard-service-type
                     (wireguard-configuration
                      (addresses (list "10.0.0.2/24"))
                      (dns '("10.0.0.50")) #does not exit
--8<---------------cut here---------------end--------------->8---

2. On Berlin, while trying to restart nginx.

In both situations, the "reboot" command was also hanging.

Thanks,

Mathieu




Added indication that bug 58926 blocks53214 Request was from Mathieu Othacehe <mathieu <at> meije.mail-host-address-is-not-set> to control <at> debbugs.gnu.org. (Mon, 31 Oct 2022 13:37:02 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'normal' Request was from Mathieu Othacehe <mathieu <at> meije.mail-host-address-is-not-set> to control <at> debbugs.gnu.org. (Mon, 31 Oct 2022 13:38:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#58926; Package guix. (Thu, 10 Nov 2022 10:00:02 GMT) Full text and rfc822 format available.

Message #12 received at 58926 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 58926 <at> debbugs.gnu.org
Subject: Re: bug#58926: Shepherd becomes unresponsive after an interrupt
Date: Thu, 10 Nov 2022 10:59:23 +0100
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> sudo herd restart service-that-hangs-upon-restart
>
>
> then hitting C-c, Shepherd becomes totally unresponsive:
>
> sudo herd status
>
>
> and all further Shpeherd commands hang forever. I was able to reproduce
> it in two different configurations:
>
> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
>             (service wireguard-service-type
>                      (wireguard-configuration
>                       (addresses (list "10.0.0.2/24"))
>                       (dns '("10.0.0.50")) #does not exit
>
> 2. On Berlin, while trying to restart nginx.

I experienced case #2: in that case ‘strace -p1’ showed that shepherd
was stuck on waitpid of the nginx process, which was not terminating.
Killing that process would unlock shepherd.

This might be <https://issues.guix.gnu.org/56674>.

Would be good to see what’s up with WireGuard.

Ludo’.




Merged 56674 58926. Request was from Mathieu Othacehe <mathieu <at> meije.mail-host-address-is-not-set> to control <at> debbugs.gnu.org. (Sat, 12 Nov 2022 08:37:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#58926; Package guix. (Sat, 12 Nov 2022 18:12:02 GMT) Full text and rfc822 format available.

Message #17 received at 58926 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 53225 <at> debbugs.gnu.org, 58926 <at> debbugs.gnu.org
Subject: Re: bug#58926: Shepherd becomes unresponsive after an interrupt
Date: Sat, 12 Nov 2022 19:10:56 +0100
Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
>             (service wireguard-service-type
>                      (wireguard-configuration
>                       (addresses (list "10.0.0.2/24"))
>                       (dns '("10.0.0.50")) #does not exit

This one is similar to:

  https://issues.guix.gnu.org/53225
  https://issues.guix.gnu.org/53381

It has to do with the fact that “wg-quick up” blocks until it succeeds
and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
terminates.

The solution will be to use something non-blocking instead of ‘invoke’;
I’m looking into it.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#58926; Package guix. (Sat, 12 Nov 2022 18:29:02 GMT) Full text and rfc822 format available.

Message #20 received at 58926 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 58926 <at> debbugs.gnu.org
Subject: Re: bug#58926: Shepherd becomes unresponsive after an interrupt
Date: Sat, 12 Nov 2022 19:28:33 +0100
Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> then hitting C-c, Shepherd becomes totally unresponsive:
>
> sudo herd status
>
>
> and all further Shpeherd commands hang forever. I was able to reproduce
> it in two different configurations:

[...]

> 2. On Berlin, while trying to restart nginx.

I can’t reproduce it in a VM.

Before I try it on a production system :-), does anyone have a tip on
how to reproduce it?  Or perhaps strace output from a system that
exhibits this bug?

TIA!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#58926; Package guix. (Mon, 14 Nov 2022 16:33:02 GMT) Full text and rfc822 format available.

Message #23 received at 58926 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 56674 <at> debbugs.gnu.org
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 58926 <at> debbugs.gnu.org
Subject: Re: bug#58926: Shepherd becomes unresponsive after an interrupt
Date: Mon, 14 Nov 2022 17:32:35 +0100
Hello!

Ludovic Courtès <ludo <at> gnu.org> skribis:

> These fresh Shepherd commits install a non-blocking ‘system*’ replacement:
>
>   975b0aa service: Provide a non-blocking replacement of 'system*'.
>   039c7a8 service: Spawn a fiber responsible for process monitoring.
>
> We’ll have to do more testing and probably go for a 0.9.3 release soon.

Shepherd commit ada88074f0ab7551fd0f3dce8bf06de971382e79 passes my
tests.  It definitely solves the wireguard example and similar things
(uses of ‘system*’ in service constructors/destructors); I can’t tell
for sure about nginx because I haven’t been able to reproduce it in a
VM.  I’m interested in ways to reproduce it.

It does look like we could go with 0.9.3 real soon now.

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 17 Nov 2022 10:24:03 GMT) Full text and rfc822 format available.

Notification sent to Mathieu Othacehe <othacehe <at> gnu.org>:
bug acknowledged by developer. (Thu, 17 Nov 2022 10:24:03 GMT) Full text and rfc822 format available.

Message #28 received at 58926-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 53225-done <at> debbugs.gnu.org, 58926-done <at> debbugs.gnu.org
Subject: Re: bug#58926: Shepherd becomes unresponsive after an interrupt
Date: Thu, 17 Nov 2022 11:23:09 +0100
Hi,

Ludovic Courtès <ludo <at> gnu.org> skribis:

> Mathieu Othacehe <othacehe <at> gnu.org> skribis:
>
>> 1. On my laptop with a Wireguard service trying to reach a non-existing
>> DNS server.
>>
>>             (service wireguard-service-type
>>                      (wireguard-configuration
>>                       (addresses (list "10.0.0.2/24"))
>>                       (dns '("10.0.0.50")) #does not exit
>
> This one is similar to:
>
>   https://issues.guix.gnu.org/53225
>   https://issues.guix.gnu.org/53381
>
> It has to do with the fact that “wg-quick up” blocks until it succeeds
> and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
> terminates.
>
> The solution will be to use something non-blocking instead of ‘invoke’;
> I’m looking into it.

This is fixed in the Shepherd 0.9.3, which landed in Guix commit
283d7318c5b312d7129adb6dbeea6ad205ce89d1.

As I wrote, I’m not sure whether it fixes the nginx situation since I
could not reproduce it.  I’m closing and let’s open a new issue
specifically for nginx if it comes up again with 0.9.3.

Thanks,
Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 17 Nov 2022 10:24:04 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Thu, 17 Nov 2022 10:24:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 15 Dec 2022 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 127 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.