GNU bug report logs - #63190
[Shepherd] Nested calls lead to a hang

Previous Next

Package: guix;

Reported by: Bruno Victal <mirai <at> makinata.eu>

Date: Sun, 30 Apr 2023 15:22:01 UTC

Severity: normal

Tags: notabug

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63190 in the body.
You can then email your comments to 63190 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#63190; Package guix. (Sun, 30 Apr 2023 15:22:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bruno Victal <mirai <at> makinata.eu>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 30 Apr 2023 15:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bruno Victal <mirai <at> makinata.eu>
To: bug-guix <bug-guix <at> gnu.org>
Cc: mirai <at> makinata.eu, bjc <at> spork.org
Subject: [Shepherd] Nested calls lead to a hang
Date: Sun, 30 Apr 2023 16:21:14 +0100
Original discussion (IRC): <https://logs.guix.gnu.org/guix/2023-04-29.log#180735>


Minimal example (annotated):

test-system.scm:
--8<---------------cut here---------------start------------->8---
(use-modules (gnu)
             (gnu tests)
             (gnu packages)
             (gnu packages base) ; coreutils/sleep
             (gnu packages admin)  ; shepherd
             (gnu services shepherd))

;; Some dummy service whose start action simply waits for some seconds,
;; about enough to check with herd status before it exits.
(define dummy-service-type
  (shepherd-service-type
   'dummy
   (lambda (cfg)
     (shepherd-service
      (documentation "Dummy action to start service.")
      (provision '(dummy-service))
      (respawn? #f)    ; <<<<<< note, this disables the service!
      (modules (cons* '(gnu services herd)
                      %default-modules))
      (start #~(lambda _
                 (format #t "Starting a delay on dummy service.~%")
                 (fork+exec-command (list #$(file-append coreutils "/bin/sleep")
                                          "30"))))
      (stop #~(make-kill-destructor))
      (actions
       (list (shepherd-action
              (name 'my-action)
              (documentation "lorem ipsum")
              (procedure
               #~(lambda (x)
                   ;; Scenario 1: using code from (gnu services herd), this hangs shepherd
                   #;(start-service 'dummy)  ; hangs shepherd
                   ;; Scenario 2: this doesn't hang shepherd but do note that the service has to be re-enabled either manually or automatically here
                   #;(system* #$(file-append shepherd "/bin/herd") "start" "dummy-service")
                   ;; Scenario 3: use the already imported (shepherd service) module, doesn't hang shepherd
                   ;;             Like Scenario 2, the service must be re-enabled since (respawn? #f) disabled this.
                   ;;             Comment: Won't re-enabling mean that this service will relaunch once it quits?
                   ;;                      That means the service has to disable itself on a successful exit, perhaps within the (stop ...) field?
                   (start 'dummy-service))))))))
   #f  ; no config
   (description "lorem ipsum.")))

(operating-system
  (inherit %simple-os)
  (services
   (cons*
    (service dummy-service-type)
    %base-services)))

--8<---------------cut here---------------end--------------->8---


Required modifications to gnu/services/shepherd.scm for scenario 1:

--8<---------------cut here---------------start------------->8---
diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index b2601c0128..158806f421 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -282,7 +282,8 @@ (define (shepherd-service-file-name service)
 
 (define (shepherd-service-file service)
   "Return a file defining SERVICE."
   (scheme-file (shepherd-service-file-name service)
-               (with-imported-modules %default-imported-modules
+               (with-imported-modules (cons '(gnu services herd)
+                                            %default-imported-modules)
                  #~(begin
                      (use-modules #$@(shepherd-service-modules service))
--8<---------------cut here---------------end--------------->8---




Information forwarded to bug-guix <at> gnu.org:
bug#63190; Package guix. (Sat, 06 May 2023 17:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 63190 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Bruno Victal <mirai <at> makinata.eu>
Cc: 63190 <at> debbugs.gnu.org, bjc <at> spork.org
Subject: Re: bug#63190: [Shepherd] Nested calls lead to a hang
Date: Sat, 06 May 2023 19:26:38 +0200
Hi,

Bruno Victal <mirai <at> makinata.eu> skribis:

> Original discussion (IRC): <https://logs.guix.gnu.org/guix/2023-04-29.log#180735>

[...]

>               (procedure
>                #~(lambda (x)
>                    ;; Scenario 1: using code from (gnu services herd), this hangs shepherd
>                    #;(start-service 'dummy)  ; hangs shepherd

(gnu services herd) provides a client to talk to the shepherd process.
However, the code of actions runs in the shepherd process itself, so
there’s no need to use the client library.  Don’t do that.  :-)

(Whether that leads to a deadlock depends; at first sight, I’d say
there’s no reason for this to deadlock in general, but you can of course
end up with a logic bug like A starts B, which spawns a client to start
A, which doesn’t start because it’s waiting for B.)

>                    ;; Scenario 2: this doesn't hang shepherd but do note that the service has to be re-enabled either manually or automatically here
>                    #;(system* #$(file-append shepherd "/bin/herd") "start" "dummy-service")

This is equivalent to the one above.

>                    ;; Scenario 3: use the already imported (shepherd service) module, doesn't hang shepherd
>                    ;;             Like Scenario 2, the service must be re-enabled since (respawn? #f) disabled this.
>                    ;;             Comment: Won't re-enabling mean that this service will relaunch once it quits?
>                    ;;                      That means the service has to disable itself on a successful exit, perhaps within the (stop ...) field?
>                    (start 'dummy-service))))))))

This should work without blocking.

However, starting a service from another one doesn’t sound great.

Could you give more context?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#63190; Package guix. (Mon, 08 May 2023 10:28:02 GMT) Full text and rfc822 format available.

Message #11 received at 63190 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Bruno Victal <mirai <at> makinata.eu>
Cc: 63190 <at> debbugs.gnu.org, bjc <at> spork.org
Subject: Re: bug#63190: [Shepherd] Nested calls lead to a hang
Date: Mon, 08 May 2023 12:27:20 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Bruno Victal <mirai <at> makinata.eu> skribis:
>
>> Original discussion (IRC): <https://logs.guix.gnu.org/guix/2023-04-29.log#180735>
>
> [...]
>
>>               (procedure
>>                #~(lambda (x)
>>                    ;; Scenario 1: using code from (gnu services herd), this hangs shepherd
>>                    #;(start-service 'dummy)  ; hangs shepherd
>
> (gnu services herd) provides a client to talk to the shepherd process.
> However, the code of actions runs in the shepherd process itself, so
> there’s no need to use the client library.  Don’t do that.  :-)

Also, the socket created in (gnu services herd) lacks SOCK_NONBLOCK so
the code above is bound to block forever.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#63190; Package guix. (Fri, 12 May 2023 23:07:02 GMT) Full text and rfc822 format available.

Message #14 received at 63190 <at> debbugs.gnu.org (full text, mbox):

From: Brian Cully <bjc <at> spork.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 63190 <at> debbugs.gnu.org, Bruno Victal <mirai <at> makinata.eu>
Subject: Re: bug#63190: [Shepherd] Nested calls lead to a hang
Date: Fri, 12 May 2023 19:01:08 -0400
Ludovic Courtès <ludo <at> gnu.org> writes:

> (Whether that leads to a deadlock depends; at first sight, I’d 
> say
> there’s no reason for this to deadlock in general, but you can 
> of course
> end up with a logic bug like A starts B, which spawns a client 
> to start
> A, which doesn’t start because it’s waiting for B.)

It's been a while since I looked at this, but my rough 
recollection is the deadlock occurs because shepherd can only 
process one request over its socket at a time. If that request 
happens to *also* try to talk over the same socket, it'll hang 
indefinitely waiting for its turn to come off the accept queue.

I'm not sure there's much to be done in the 0.9 version of 
shepherd about it. I'm hoping that 0.10 and up will be able to 
cope with situations like this without completely deadlocking the 
shepherd itself. It's obviously pretty bad if pid 1 hangs for any 
reason at all, even user error.

-bjc




Information forwarded to bug-guix <at> gnu.org:
bug#63190; Package guix. (Sat, 13 May 2023 09:47:02 GMT) Full text and rfc822 format available.

Message #17 received at 63190 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Brian Cully <bjc <at> spork.org>
Cc: 63190 <at> debbugs.gnu.org, Bruno Victal <mirai <at> makinata.eu>
Subject: Re: bug#63190: [Shepherd] Nested calls lead to a hang
Date: Sat, 13 May 2023 11:45:51 +0200
Brian Cully <bjc <at> spork.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> (Whether that leads to a deadlock depends; at first sight, I’d say
>> there’s no reason for this to deadlock in general, but you can of
>> course
>> end up with a logic bug like A starts B, which spawns a client to
>> start
>> A, which doesn’t start because it’s waiting for B.)
>
> It's been a while since I looked at this, but my rough recollection is
> the deadlock occurs because shepherd can only process one request over
> its socket at a time.

That’s not the case in 0.9: it can process several requests
concurrently.  However, as I wrote in a followup message, the client
socket created by (gnu services herd) lacks SOCK_NONBLOCK, which can
thus block the process on reads and writes.

Ludo’.




Added tag(s) notabug. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 13 May 2023 09:47:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 63190 <at> debbugs.gnu.org and Bruno Victal <mirai <at> makinata.eu> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 13 May 2023 09:47:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 10 Jun 2023 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 313 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.