GNU bug report logs - #41429
Shepherd Sometimes Crashes

Previous Next

Package: guix;

Reported by: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>

Date: Thu, 21 May 2020 03:00:02 UTC

Severity: important

Merged with 40981

Done: Mathieu Othacehe <mathieu <at> meru.i-did-not-set--mail-host-address--so-tickle-me>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41429 in the body.
You can then email your comments to 41429 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#41429; Package guix. (Thu, 21 May 2020 03:00:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Katherine Cox-Buday <cox.katherine.e <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 21 May 2020 03:00:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>
To: bug-guix <at> gnu.org
Subject: Shepherd Sometimes Crashes
Date: Wed, 20 May 2020 21:59:03 -0500
I am running shepherd as a userspace service manager on an alien distro.
Occassionally (often enough as to cause concern), Shepherd is crashing.
I am unable to narrow down a cause, but anecdotally, it seems to happen
more often when a service it's managing fails repeatedly and is
disabled.

I'm running `strace` against the Shepherd process in an attempt to
submit a better bug report, but this is all I have for now. Maybe others
have also seen this behavior.

-- 
Katherine




Information forwarded to bug-guix <at> gnu.org:
bug#41429; Package guix. (Thu, 21 May 2020 12:16:01 GMT) Full text and rfc822 format available.

Message #8 received at 41429 <at> debbugs.gnu.org (full text, mbox):

From: Efraim Flashner <efraim <at> flashner.co.il>
To: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>
Cc: 41429 <at> debbugs.gnu.org
Subject: Re: bug#41429: Shepherd Sometimes Crashes
Date: Thu, 21 May 2020 15:14:43 +0300
[Message part 1 (text/plain, inline)]
On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
> I am running shepherd as a userspace service manager on an alien distro.
> Occassionally (often enough as to cause concern), Shepherd is crashing.
> I am unable to narrow down a cause, but anecdotally, it seems to happen
> more often when a service it's managing fails repeatedly and is
> disabled.
> 
> I'm running `strace` against the Shepherd process in an attempt to
> submit a better bug report, but this is all I have for now. Maybe others
> have also seen this behavior.

I found it happens less often with shepherd-0.8. What version are you
running? Also possibly related, do you have mismatched versions of guile
between guix packages and your distro's native packages?

I've also sometimes found shepherd to crash when I add a service where
the start command is "wrong", as though the error were so bad that
shepherd says "Nope! That's it! I quit!"

I'd suggest looking at .config/shepherd/shepherd.log but it's rather
sparse. Still, it might have something useful.

-- 
Efraim Flashner   <efraim <at> flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41429; Package guix. (Thu, 21 May 2020 12:53:01 GMT) Full text and rfc822 format available.

Message #11 received at 41429 <at> debbugs.gnu.org (full text, mbox):

From: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>
To: Efraim Flashner <efraim <at> flashner.co.il>
Cc: 41429 <at> debbugs.gnu.org
Subject: Re: bug#41429: Shepherd Sometimes Crashes
Date: Thu, 21 May 2020 07:51:54 -0500
Efraim Flashner <efraim <at> flashner.co.il> writes:

> On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
>> I am running shepherd as a userspace service manager on an alien distro.
>> Occassionally (often enough as to cause concern), Shepherd is crashing.
>> I am unable to narrow down a cause, but anecdotally, it seems to happen
>> more often when a service it's managing fails repeatedly and is
>> disabled.
>> 
>> I'm running `strace` against the Shepherd process in an attempt to
>> submit a better bug report, but this is all I have for now. Maybe others
>> have also seen this behavior.
>
> I found it happens less often with shepherd-0.8. What version are you
> running? Also possibly related, do you have mismatched versions of guile
> between guix packages and your distro's native packages?

Sorry, I forgot to include the version! I am running 0.8 from a store
which I update ~1 week.

> I've also sometimes found shepherd to crash when I add a service where
> the start command is "wrong", as though the error were so bad that
> shepherd says "Nope! That's it! I quit!"

I'm doing very standard things with `make-forkexec-constructor`, so I
wouldn't expect any problems there.

Your comment is kind of scary though! Shepherd is the thing I want to
stay up no matter what since it's responsible for monitoring and
restarting things. The idea that a misbehaving or poorly written service
could bring down the entire Shepherd process is a problem! Is there no
isolation?

> I'd suggest looking at .config/shepherd/shepherd.log but it's rather
> sparse. Still, it might have something useful.

Yes, this is the first place I looked, but unfortunately there wasn't
much usable informatino.

-- 
Katherine




Information forwarded to bug-guix <at> gnu.org:
bug#41429; Package guix. (Thu, 21 May 2020 14:06:01 GMT) Full text and rfc822 format available.

Message #14 received at 41429 <at> debbugs.gnu.org (full text, mbox):

From: Efraim Flashner <efraim <at> flashner.co.il>
To: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>
Cc: 41429 <at> debbugs.gnu.org
Subject: Re: bug#41429: Shepherd Sometimes Crashes
Date: Thu, 21 May 2020 17:04:42 +0300
[Message part 1 (text/plain, inline)]
On Thu, May 21, 2020 at 07:51:54AM -0500, Katherine Cox-Buday wrote:
> Efraim Flashner <efraim <at> flashner.co.il> writes:
> 
> > On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
> >> I am running shepherd as a userspace service manager on an alien distro.
> >> Occassionally (often enough as to cause concern), Shepherd is crashing.
> >> I am unable to narrow down a cause, but anecdotally, it seems to happen
> >> more often when a service it's managing fails repeatedly and is
> >> disabled.
> >> 
> >> I'm running `strace` against the Shepherd process in an attempt to
> >> submit a better bug report, but this is all I have for now. Maybe others
> >> have also seen this behavior.
> >
> > I found it happens less often with shepherd-0.8. What version are you
> > running? Also possibly related, do you have mismatched versions of guile
> > between guix packages and your distro's native packages?
> 
> Sorry, I forgot to include the version! I am running 0.8 from a store
> which I update ~1 week.
> 
> > I've also sometimes found shepherd to crash when I add a service where
> > the start command is "wrong", as though the error were so bad that
> > shepherd says "Nope! That's it! I quit!"
> 
> I'm doing very standard things with `make-forkexec-constructor`, so I
> wouldn't expect any problems there.
> 
> Your comment is kind of scary though! Shepherd is the thing I want to
> stay up no matter what since it's responsible for monitoring and
> restarting things. The idea that a misbehaving or poorly written service
> could bring down the entire Shepherd process is a problem! Is there no
> isolation?

I have a whole collection of attempts to integrate mcron with shepherd,
to create loops and add jobs only when the service is active. Attempting
to fork off and then collect the child process and then fail just enough
to make the service restart. Lots of cringe-worthy code. The more common
fail scenarios I see are shepherd fails to start because it doesn't like
my start code of one of the services or actually starting the service
somehow kills it. All of those were with straight lambdas to the start
command though.

Do you have your services writing out any logs? Maybe there's a clue
there.

> > I'd suggest looking at .config/shepherd/shepherd.log but it's rather
> > sparse. Still, it might have something useful.
> 
> Yes, this is the first place I looked, but unfortunately there wasn't
> much usable informatino.
> 
> -- 
> Katherine

-- 
Efraim Flashner   <efraim <at> flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41429; Package guix. (Thu, 21 May 2020 16:00:02 GMT) Full text and rfc822 format available.

Message #17 received at 41429 <at> debbugs.gnu.org (full text, mbox):

From: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>
To: Efraim Flashner <efraim <at> flashner.co.il>
Cc: 41429 <at> debbugs.gnu.org
Subject: Re: bug#41429: Shepherd Sometimes Crashes
Date: Thu, 21 May 2020 10:59:43 -0500
Efraim Flashner <efraim <at> flashner.co.il> writes:

>> Your comment is kind of scary though! Shepherd is the thing I want to
>> stay up no matter what since it's responsible for monitoring and
>> restarting things. The idea that a misbehaving or poorly written service
>> could bring down the entire Shepherd process is a problem! Is there no
>> isolation?
>
> I have a whole collection of attempts to integrate mcron with shepherd,
> to create loops and add jobs only when the service is active. Attempting
> to fork off and then collect the child process and then fail just enough
> to make the service restart. Lots of cringe-worthy code. The more common
> fail scenarios I see are shepherd fails to start because it doesn't like
> my start code of one of the services or actually starting the service
> somehow kills it. All of those were with straight lambdas to the start
> command though.

I'm not familiar with Shepherd's internals, so I don't know why
interacting with a cron is relevant.

> Do you have your services writing out any logs? Maybe there's a clue
> there.

Not yet, but I should be enabling this soon, and if they display
anything I'll report back.

Still, this seems beside the point: the bug is that Shepherd needs to
stay up regardless of what the services it's monitoring do.

-- 
Katherine




Information forwarded to bug-guix <at> gnu.org:
bug#41429; Package guix. (Fri, 22 May 2020 17:40:02 GMT) Full text and rfc822 format available.

Message #20 received at 41429 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Katherine Cox-Buday <cox.katherine.e <at> gmail.com>
Cc: 41429 <at> debbugs.gnu.org
Subject: Re: bug#41429: Shepherd Sometimes Crashes
Date: Fri, 22 May 2020 19:39:09 +0200
Hello Katherine,

> I'm running `strace` against the Shepherd process in an attempt to
> submit a better bug report, but this is all I have for now. Maybe others
> have also seen this behavior.

Yes, I have observed this behavior. This should be fixed with the
upcoming 0.8.1 release of Shepherd (hopefully !).

See: https://lists.gnu.org/archive/html/bug-guix/2020-05/msg00241.html.

Thanks for reporting,

Mathieu




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 22 May 2020 20:16:02 GMT) Full text and rfc822 format available.

Merged 40981 41429. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 22 May 2020 20:16:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 40981 <at> debbugs.gnu.org and Mathieu Othacehe <m.othacehe <at> gmail.com> Request was from Mathieu Othacehe <mathieu <at> meru.i-did-not-set--mail-host-address--so-tickle-me> to control <at> debbugs.gnu.org. (Sat, 20 Jun 2020 10:07:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 18 Jul 2020 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 255 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.