GNU bug report logs - #43643
start shepherd when a previous instance was killed by kill -9

Previous Next

Package: guix;

Reported by: gfleury <gfleury <at> disroot.org>

Date: Sun, 27 Sep 2020 08:01:02 UTC

Severity: normal

To reply to this bug, email your comments to 43643 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#43643; Package guix. (Sun, 27 Sep 2020 08:01:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to gfleury <gfleury <at> disroot.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 27 Sep 2020 08:01:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: gfleury <gfleury <at> disroot.org>
To: bug-guix <at> gnu.org
Subject: start shepherd when a previous instance was killed by kill -9   
Date: Sun, 27 Sep 2020 10:00:03 +0200
[Message part 1 (text/plain, inline)]
Hi,

when killing shepherd i.e `pkill -9 shepherd` it left behind
`default-socket-file` and when restarted whithout remove the socket like
---------------------------------------------------------
rm /var/run/user/1000/shepherd/socket
---------------------------------------------------------

it throws a error:
---------------------------------------------------------
3 (primitive-load "/home/gfleury/prod/shepherd/./shepherd")
In shepherd.scm:
    56:14  2 (main . _)
     49:6  1 (open-server-socket _)
In unknown file:
           0 (bind #<input-output: socket 16> #(1 "/run/user/1000?") #)

ERROR: In procedure bind:
In procedure bind: Address already in use
---------------------------------------------------------

something like this patch can fix it.
[0001-ensure-that-default-socket-file-is-not-present.patch (text/x-diff, inline)]
From 7d16c47bad6fd98cf0838d2fcd62735d846e7bab Mon Sep 17 00:00:00 2001
From: gfleury <gfleury <at> disroot.org>
Date: Sun, 27 Sep 2020 09:29:37 +0200
Subject: [PATCH] ensure that `default-socket-file` is not present.

* modules/shepherd.scm(main): remove a possible `default-socket-file`
  left by a previous instance.
---
 modules/shepherd.scm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 9f80f62..d18567e 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -147,7 +147,10 @@ already ~a threads running, disabling 'signalfd' support")
   (initialize-cli)
 
   (let ((config-file #f)
-	(socket-file default-socket-file)
+	(socket-file
+         (begin
+           (false-if-exception (delete-file default-socket-file))
+           default-socket-file))
         (pid-file    #f)
         (secure      #t)
         (logfile     #f))
-- 
2.28.0


Information forwarded to bug-guix <at> gnu.org:
bug#43643; Package guix. (Sun, 27 Sep 2020 14:20:02 GMT) Full text and rfc822 format available.

Message #8 received at 43643 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: gfleury <gfleury <at> disroot.org>
Cc: 43643 <at> debbugs.gnu.org
Subject: Re: bug#43643: start shepherd when a previous instance was killed
 by kill -9
Date: Sun, 27 Sep 2020 16:19:06 +0200
[Message part 1 (text/plain, inline)]
Hello,

On Sun, 27 Sep 2020 10:00:03 +0200
gfleury <gfleury <at> disroot.org> wrote:

> it throws a error:
> ---------------------------------------------------------
> 3 (primitive-load "/home/gfleury/prod/shepherd/./shepherd")
> In shepherd.scm:
>     56:14  2 (main . _)
>      49:6  1 (open-server-socket _)
> In unknown file:
>            0 (bind #<input-output: socket 16> #(1 "/run/user/1000?") #)
> 
> ERROR: In procedure bind:
> In procedure bind: Address already in use
> ---------------------------------------------------------
> 
> something like this patch can fix it.

Please don't do it that way.

Shepherd has to be able to ascertain that it is not running yet before
starting yet another instance in parallel.

I don't like PID and socket files either--but it's just what we have
available.

Maybe find out who is at the other side of the socket
(connect and then use getpeername on the socket or something ?
 maybe even just trying to connect fails, which would be good for this).

I think UNIX domain sockets are made in a way that it doesn't matter
whether the server or the client connects first, so even that would
probably not be reliable.

So maybe just live with having to remove the socket file yourself.

I'm open to other suggestions that are safe that accomplish the same goal.
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43643; Package guix. (Sun, 27 Sep 2020 18:10:02 GMT) Full text and rfc822 format available.

Message #11 received at 43643 <at> debbugs.gnu.org (full text, mbox):

From: gfleury <at> disroot.org
To: "Danny Milosavljevic" <dannym <at> scratchpost.org>
Cc: 43643 <at> debbugs.gnu.org
Subject: Re: bug#43643: start shepherd when a previous instance was killed
 by kill -9
Date: Sun, 27 Sep 2020 18:09:21 +0000
hello,

27 septembre 2020 16:29 "Danny Milosavljevic" <dannym <at> scratchpost.org> a écrit:

> Hello,
> 
> On Sun, 27 Sep 2020 10:00:03 +0200
> gfleury <gfleury <at> disroot.org> wrote:
> 
>> it throws a error:
>> ---------------------------------------------------------
>> 3 (primitive-load "/home/gfleury/prod/shepherd/./shepherd")
>> In shepherd.scm:
>> 56:14 2 (main . _)
>> 49:6 1 (open-server-socket _)
>> In unknown file:
>> 0 (bind #<input-output: socket 16> #(1 "/run/user/1000?") #)
>> 
>> ERROR: In procedure bind:
>> In procedure bind: Address already in use
>> ---------------------------------------------------------
>> 
>> something like this patch can fix it.
> 
> Please don't do it that way.
> 
> Shepherd has to be able to ascertain that it is not running yet before
> starting yet another instance in parallel.
> 
i missed that part.

> I don't like PID and socket files either--but it's just what we have
> available.
> 
> Maybe find out who is at the other side of the socket
> (connect and then use getpeername on the socket or something ?
> maybe even just trying to connect fails, which would be good for this).
> 
> I think UNIX domain sockets are made in a way that it doesn't matter
> whether the server or the client connects first, so even that would
> probably not be reliable.
> 
> So maybe just live with having to remove the socket file yourself.
> 
> I'm open to other suggestions that are safe that accomplish the same goal.

yes a better solution is needed.




This bug report was last modified 3 years and 183 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.