GNU bug report logs - #56209
Shepherd 0.9 not cleanly unmounting root

Previous Next

Package: guix;

Reported by: angry rectangle <angryrectangle <at> cock.li>

Date: Sat, 25 Jun 2022 05:36:02 UTC

Severity: important

Merged with 56250

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56209 in the body.
You can then email your comments to 56209 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#56209; Package guix. (Sat, 25 Jun 2022 05:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to angry rectangle <angryrectangle <at> cock.li>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 25 Jun 2022 05:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: angry rectangle <angryrectangle <at> cock.li>
To: bug-guix <at> gnu.org
Subject: Shepherd 0.9 not cleanly unmounting root
Date: Sat, 25 Jun 2022 01:27:02 -0400
Since the upgrade to shepherd 0.9, I get "recovering journal" every single time I start my computer.
To be specific, "recovering journal" appears after I enter my encryption password in the initrd.
I assume this means the filesystem wasn't cleanly unmounted.
I am doing a proper shutdown, using either "reboot" or "halt." 

I've attached the minimal config I've been using.
It's nothing special other than encrypted root.
I'm using an SSD with a gpt partition table.
No custom packages or external channels were used when configuring the system.

This is for my desktop computer, but I have the exact same problem with a similar minimal config on my laptop.
Mostly the same sitution there with an SSD, gpt table, and encrypted root.

The guix commit 400c9ed3d779308e56038305d40cd93acb496180 is the specific commit that upgrades shepherd and causes me this problem. The previous commit is fine.
I'm can confirm that it's still broken on recent commits. I'm on 696e2cc345f015c32f211bf0d0330c04b1cf5f15.

Thanks




Information forwarded to bug-guix <at> gnu.org:
bug#56209; Package guix. (Sat, 25 Jun 2022 05:45:01 GMT) Full text and rfc822 format available.

Message #8 received at 56209 <at> debbugs.gnu.org (full text, mbox):

From: angry rectangle <angryrectangle <at> cock.li>
To: 56209 <at> debbugs.gnu.org
Date: Sat, 25 Jun 2022 01:42:33 -0400
[Message part 1 (text/plain, inline)]
forgot the attachment.
[conf.scm (text/plain, attachment)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 27 Jun 2022 08:05:03 GMT) Full text and rfc822 format available.

Merged 56209 56250. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 27 Jun 2022 08:05:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#56209; Package guix. (Mon, 27 Jun 2022 21:52:02 GMT) Full text and rfc822 format available.

Message #15 received at 56209 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: angry rectangle <angryrectangle <at> cock.li>
Cc: 56209 <at> debbugs.gnu.org
Subject: Re: bug#56209: Shepherd 0.9 not cleanly unmounting root
Date: Mon, 27 Jun 2022 23:50:58 +0200
[Message part 1 (text/plain, inline)]
Hi,

angry rectangle <angryrectangle <at> cock.li> skribis:

> Since the upgrade to shepherd 0.9, I get "recovering journal" every single time I start my computer.
> To be specific, "recovering journal" appears after I enter my encryption password in the initrd.
> I assume this means the filesystem wasn't cleanly unmounted.
> I am doing a proper shutdown, using either "reboot" or "halt." 

I can see that as well.

> The guix commit 400c9ed3d779308e56038305d40cd93acb496180 is the specific commit that upgrades shepherd and causes me this problem. The previous commit is fine.
> I'm can confirm that it's still broken on recent commits. I'm on 696e2cc345f015c32f211bf0d0330c04b1cf5f15.

Preliminary investigation suggests this is because shepherd doesn’t
close log files beforehand (in 0.9, those specified as #:log-file to
‘make-forkexec-constructor’ & co. are opened by PID 1; conversely,
shepherd 0.8 would open them in the child process.)

To be continued…

Thanks for reporting the issue and finding the offending commit!

Ludo’.

PS: Below my (ugly) debugging tricks for posterity.  To see those
    messages, you typically need to start a VM with ‘-serial stdio’ and
    pass “console=ttyS0” to the kernel.  (It’s best to start a
    standalone VM with an image created by ‘guix system image -t
    qcow2’.)

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index d58afb27e3..25d747d226 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -299,6 +299,9 @@ (define %root-file-system-shepherd-service
    (stop #~(lambda _
              ;; Return #f if successfully stopped.
              (sync)
+             (call-with-port (open-file "/dev/console" "w0")
+               (lambda (port)
+                 (display "This is my last message.\n" port)))
 
              (call-with-blocked-asyncs
               (lambda ()
@@ -314,11 +317,24 @@ (define %root-file-system-shepherd-service
                   ;; Close /dev/console.
                   (for-each close-fdes '(0 1 2))
 
-                  ;; At this point, there are no open files left, so the
-                  ;; root file system can be re-mounted read-only.
-                  (mount #f "/" #f
-                         (logior MS_REMOUNT MS_RDONLY)
-                         #:update-mtab? #f)
+                  (open-fdes "/dev/null" O_RDONLY)
+                  (open-fdes "/dev/console" O_WRONLY)
+                  (open-fdes "/dev/console" O_WRONLY)
+                  (current-output-port (fdopen 1 "w0"))
+                  (current-error-port (fdopen 2 "w0"))
+                  (pk 'umount-root)
+
+                  (catch 'system-error
+                    (lambda ()
+                      ;; At this point, there are no open files left, so the
+                      ;; root file system can be re-mounted read-only.
+                      (mount #f "/" #f
+                             (logior MS_REMOUNT MS_RDONLY)
+                             #:update-mtab? #f))
+                    (lambda args
+                      (pk 'umount-root-error args)
+                      #f))
+                  (pk 'done-umount-root)
 
                   #f)))))
    (respawn? #f)))
@@ -406,7 +422,28 @@ (define (file-system-shepherd-service file-system)
                       ;; Make sure PID 1 doesn't keep TARGET busy.
                       (chdir "/")
 
-                      (umount #$target)
+                      (call-with-port (open-file "/dev/console" "w0")
+                        (lambda (port)
+                          (parameterize ((current-output-port port)
+                                         (current-error-port port))
+                            (pk 'umount #$target)
+                            #$(if (file-system-mount-may-fail? file-system)
+                                  #~(catch 'system-error
+                                      (lambda ()
+                                        (umount #$target))
+                                      (const #f))
+                                  #~(catch 'system-error
+                                      (lambda ()
+                                        (umount #$target))
+                                      (lambda args
+                                        (pk 'umount-error args)
+                                        (system* #$(file-append (@ (gnu
+                                                                    packages
+                                                                    lsof)
+                                                                   lsof)
+                                                                "/bin/lsof"))
+                                        #f)))
+                            (pk 'done-umount #$target))))
                       #f))
 
             ;; We need additional modules.
diff --git a/gnu/system/examples/bare-bones.tmpl b/gnu/system/examples/bare-bones.tmpl
index 387e4b12ba..1f9012c167 100644
--- a/gnu/system/examples/bare-bones.tmpl
+++ b/gnu/system/examples/bare-bones.tmpl
@@ -2,8 +2,8 @@
 ;; for a "bare bones" setup, with no X11 display server.
 
 (use-modules (gnu))
-(use-service-modules networking ssh)
-(use-package-modules screen ssh)
+(use-service-modules networking ssh shepherd)
+(use-package-modules screen ssh admin)
 
 (operating-system
   (host-name "komputilo")
@@ -38,6 +38,13 @@
                                         "audio" "video")))
                %base-user-accounts))
 
+  (essential-services
+   (modify-services (operating-system-default-essential-services
+                     this-operating-system)
+     (shepherd-root-service-type
+      config => (shepherd-configuration
+                 (shepherd shepherd-0.8)))))
+
   ;; Globally-installed packages.
   (packages (cons screen %base-packages))
 

Added indication that bug 56209 blocks53214 Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 27 Jun 2022 21:52:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#56209; Package guix. (Tue, 28 Jun 2022 22:03:01 GMT) Full text and rfc822 format available.

Message #20 received at 56209 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: angry rectangle <angryrectangle <at> cock.li>
Cc: 56209 <at> debbugs.gnu.org
Subject: Re: bug#56209: Shepherd 0.9 not cleanly unmounting root
Date: Wed, 29 Jun 2022 00:02:45 +0200
[Message part 1 (text/plain, inline)]
Hi,

I believe the attached patch fixes the problem.  I’ll do more testing on
my side but I’d be grateful if someone would give it a try too.

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index 17b7b38a15..dea58354d9 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -328,7 +328,18 @@ (define-public shepherd-0.9
                                   version ".tar.gz"))
               (sha256
                (base32
-                "0l2arn6gsyw88xk9phxnyplvv1mn8sqp3ipgyyb0nszdzvxlgd36"))))
+                "0l2arn6gsyw88xk9phxnyplvv1mn8sqp3ipgyyb0nszdzvxlgd36"))
+              (modules '((guix build utils)))
+              (snippet
+               ;; Avoid continuation barriers so (@ (fibers) sleep) can be
+               ;; called from a service's 'stop' method
+               '(substitute* "modules/shepherd/service.scm"
+                  (("call-with-blocked-asyncs")   ;in 'stop' method
+                   "(lambda (thunk) (thunk))")
+                  (("\\(for-each-service\n")      ;in 'shutdown-services'
+                   "((lambda (proc)
+                       (for-each proc
+                                 (fold-services cons '())))\n")))))
     (arguments
      (list #:configure-flags #~'("--localstatedir=/var")
            #:make-flags #~'("GUILE_AUTO_COMPILE=0")
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index d58afb27e3..1fd4cd84f3 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -300,27 +300,36 @@ (define %root-file-system-shepherd-service
              ;; Return #f if successfully stopped.
              (sync)
 
-             (call-with-blocked-asyncs
-              (lambda ()
-                (let ((null (%make-void-port "w")))
-                  ;; Close 'shepherd.log'.
-                  (display "closing log\n")
-                  ((@ (shepherd comm) stop-logging))
+             (let ((null (%make-void-port "w")))
+               ;; Close 'shepherd.log'.
+               (display "closing log\n")
+               ((@ (shepherd comm) stop-logging))
 
-                  ;; Redirect the default output ports..
-                  (set-current-output-port null)
-                  (set-current-error-port null)
+               ;; Redirect the default output ports..
+               (set-current-output-port null)
+               (set-current-error-port null)
 
-                  ;; Close /dev/console.
-                  (for-each close-fdes '(0 1 2))
+               ;; Close /dev/console.
+               (for-each close-fdes '(0 1 2))
 
-                  ;; At this point, there are no open files left, so the
-                  ;; root file system can be re-mounted read-only.
-                  (mount #f "/" #f
-                         (logior MS_REMOUNT MS_RDONLY)
-                         #:update-mtab? #f)
+               (let loop ((n 10))
+                 (unless (catch 'system-error
+                           (lambda ()
+                             ;; At this point, there are no open files left, so the
+                             ;; root file system can be re-mounted read-only.
+                             (mount #f "/" #f
+                                    (logior MS_REMOUNT MS_RDONLY)
+                                    #:update-mtab? #f)
+                             #t)
+                           (const #f))
+                   (unless (zero? n)
+                     ;; Yield to the other fibers.  That gives logging fibers
+                     ;; an opportunity to close log files so the 'mount' call
+                     ;; doesn't fail with EBUSY.
+                     ((@ (fibers) sleep) 1)
+                     (loop (- n 1)))))
 
-                  #f)))))
+               #f)))
    (respawn? #f)))
 
 (define root-file-system-service-type

Information forwarded to bug-guix <at> gnu.org:
bug#56209; Package guix. (Wed, 29 Jun 2022 00:32:02 GMT) Full text and rfc822 format available.

Message #23 received at 56209 <at> debbugs.gnu.org (full text, mbox):

From: angry rectangle <angryrectangle <at> cock.li>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 56209 <at> debbugs.gnu.org
Subject: Re: bug#56209: Shepherd 0.9 not cleanly unmounting root
Date: Tue, 28 Jun 2022 20:18:48 -0400
I tried your patch on one of my computers and it works.
Thank you.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Fri, 01 Jul 2022 10:30:02 GMT) Full text and rfc822 format available.

Notification sent to angry rectangle <angryrectangle <at> cock.li>:
bug acknowledged by developer. (Fri, 01 Jul 2022 10:30:03 GMT) Full text and rfc822 format available.

Message #28 received at 56209-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: angry rectangle <angryrectangle <at> cock.li>
Cc: 56209-done <at> debbugs.gnu.org
Subject: Re: bug#56209: Shepherd 0.9 not cleanly unmounting root
Date: Fri, 01 Jul 2022 12:29:31 +0200
Hi,

angry rectangle <angryrectangle <at> cock.li> skribis:

> I tried your patch on one of my computers and it works.

Thanks for testing.

Pushed as 0483c71cc5aeb3b69f6deb154fe12c0b2e6dc17f.  The reason is took
me more time is that I wanted to have a system test for that to make
sure it doesn’t come back to haunt us in the future.  Now we should be
fine.  :-)

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Fri, 01 Jul 2022 10:30:03 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Fri, 01 Jul 2022 10:30:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 29 Jul 2022 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 265 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.