GNU bug report logs - #68800
Guix waits for termination of a kernel thread

Previous Next

Package: guix;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Mon, 29 Jan 2024 17:35:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 68800 in the body.
You can then email your comments to 68800 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#68800; Package guix. (Mon, 29 Jan 2024 17:35:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 29 Jan 2024 17:35:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: bug-guix <at> gnu.org
Subject: Guix waits for termination of a kernel thread
Date: Mon, 29 Jan 2024 18:34:33 +0100
[Message part 1 (text/plain, inline)]
Hello,

when Guix machine is shutting down, it keeps waiting for PID associated with
[mt76-tx phy0] to terminate.  Since it is a kernel thread, it does not happen.

Previous discussion on this bug was done via email, and is copied here:



Date: Sun, 7 Jan 2024 15:59:51 +0100
From: Tomas Volf <~@wolfsden.cz>
To: guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
> We are pleased to announce the GNU Shepherd version 0.10.3, a bug-fix
> release of the new 0.10.x series, representing 51 commits over 6 months.

Congratulations on the release :)

>
>   ** Do not accidentally wait for Linux kernel thread completion
>      (<https://issues.guix.gnu.org/67132>)
>
>   In cases a PID file contained a bogus PID or one that’s only valid in a
>   separate PID namespace, shepherd could end up waiting for the termination of
>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
>   situation is now recognized and avoided.

This is great, I will not have to remember to run `modprobe -r mt7921e' before
each shutdown anymore.  I hope.  Looking forward to getting it in the Guix :)

Have a nice 2024,
Tomas Volf



Date: Wed, 10 Jan 2024 00:34:48 +0100
From: Ludovic Courtès <ludo <at> gnu.org>
To: guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

Tomas Volf <~@wolfsden.cz> skribis:

> On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:

[...]

>>   ** Do not accidentally wait for Linux kernel thread completion
>>      (<https://issues.guix.gnu.org/67132>)
>>
>>   In cases a PID file contained a bogus PID or one that’s only valid in a
>>   separate PID namespace, shepherd could end up waiting for the termination of
>>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
>>   situation is now recognized and avoided.
>
> This is great, I will not have to remember to run `modprobe -r mt7921e' before
> each shutdown anymore.  I hope.  Looking forward to getting it in the Guix :)

D’oh, why did you have to do that?  How did Shepherd end up with “wrong” PID?

I hope this release fixes it!

Ludo’.



Date: Wed, 10 Jan 2024 17:38:17 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

On 2024-01-10 00:34:48 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
>
> [...]
>
> >>   ** Do not accidentally wait for Linux kernel thread completion
> >>      (<https://issues.guix.gnu.org/67132>)
> >>
> >>   In cases a PID file contained a bogus PID or one that’s only valid in a
> >>   separate PID namespace, shepherd could end up waiting for the termination of
> >>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
> >>   situation is now recognized and avoided.
> >
> > This is great, I will not have to remember to run `modprobe -r mt7921e' before
> > each shutdown anymore.  I hope.  Looking forward to getting it in the Guix :)
>
> D’oh, why did you have to do that?

Otherwise the shepherd would be stuck on shutdown waiting for process named

    [mt76-tx phy0]

to terminate with messages along the lines of:

    shepherd[1]: waiting for process termination (processes left: (1 678))

It is a kernel thread as far as I can tell (based on
https://stackoverflow.com/a/12231039):

    $ cd /proc/678
    $ cat cmdline
    $ readlink exe; echo $?
    1

Removing the module mt7921e stops the thread, so shepherd does not wait for it.

> How did Shepherd end up with “wrong” PID?

That I do not know.  It is visible in `ps' output, so I assume shepherd picked
it up on its own somehow.

>
> I hope this release fixes it!

As far as I can tell, the 0.10.3 was already added into guix:

    $ ps 1 | cat
      PID TTY      STAT   TIME COMMAND
        1 ?        Sl     0:01 /gnu/store/bhynhk0c6ssq3fqqc59fvhxjzwywsjbb-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/06mz0yjkghi7r6d7lmhvv7gryipljhdd-shepherd-0.10.3/bin/shepherd
+--config /gnu/store/klkqq2y65k141rlipq4ls0w2rlhds12h-shepherd.conf

So I have to say it sadly did not resolve this issue.  I am unsure why though.
I am not familiar with Shepherd's code base, but quick look at the git log
suggested that procedure (@@ (shepherd service) pseudo-process?) is the relevant
one.  When I try it from a REPL, it returns #t.

    $ guix shell guile shepherd guile-fibers -- guile
    GNU Guile 3.0.9
    Copyright (C) 1995-2023 Free Software Foundation, Inc.

    Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
    This program is free software, and you are welcome to redistribute it
    under certain conditions; type `,show c' for details.

    Enter `,help' for help.
    scheme@(guile-user)> ,use (shepherd service)
    scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)
    $1 = #t

So it *should* work?  However the issue is caused by non-free WiFi driver on a
corrupted kernel, so I am not sure if it is even problem that needs to be
solved...  I would (obviously) like to see it resolved, but I probably cannot
even bug report it, since it requires non-free hardware and software to
reproduce.

Tomas

PS: It is interesting that `guix shell guile shepherd' is not enough, the
    guile-fibers have to be explicitly specified as well.  Is that expected?



Date: Wed, 10 Jan 2024 17:50:19 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>, guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

PS:

On 2024-01-10 17:38:17 +0100, Tomas Volf wrote:
>     scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)

The pid is different than above, because this was after a reboot.

Tomas



Date: Thu, 11 Jan 2024 13:41:39 +0100
From: Ludovic Courtès <ludo <at> gnu.org>
To: guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

Hello,

Tomas Volf <~@wolfsden.cz> skribis:

> Otherwise the shepherd would be stuck on shutdown waiting for process named
>
>     [mt76-tx phy0]
>
> to terminate with messages along the lines of:
>
>     shepherd[1]: waiting for process termination (processes left: (1 678))
>
> It is a kernel thread as far as I can tell (based on
> https://stackoverflow.com/a/12231039):
>
>     $ cd /proc/678
>     $ cat cmdline
>     $ readlink exe; echo $?
>     1
>
> Removing the module mt7921e stops the thread, so shepherd does not wait for it.

Ooooh.

Then I’m afraid this bug isn’t fixed yet because that code (“waiting for
process termination”) is currently in Guix, not in Shepherd.

However, ‘processes’, which is what is used here and which is defined in
(guix build syscalls), already checks for kernel threads, though it
does it differently than what I implemented in shepherd:

  (define (kernel? pid)
    "Return #t if PID designates a \"kernel thread\" rather than a normal
  user-land process."
    (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid)
                  (compose string-tokenize read-string))))
      ;; See proc.txt in Linux's documentation for the list of fields.
      (match stat
        ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt
              cmin_flt maj_flt cmaj_flt utime stime cutime cstime
              priority nice num_thread it_real_value start_time
              vsize rss rsslim
              (= string->number start_code) (= string->number end_code) _ ...)
         ;; Got this obscure trick from sysvinit's 'killall5' program.
         (and (zero? start_code) (zero? end_code))))))

It would be great if you could check whether this approach works for
you.

(I had completely forgotten about this code.  Funny thing is this one
was inspired by sysvinit, whereas the one in Shepherd was inspired by
systemd.  A sign of times!)

Ludo’.



Date: Thu, 11 Jan 2024 14:12:51 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

On 2024-01-11 13:41:39 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > Otherwise the shepherd would be stuck on shutdown waiting for process named
> >
> >     [mt76-tx phy0]
> >
> > to terminate with messages along the lines of:
> >
> >     shepherd[1]: waiting for process termination (processes left: (1 678))
> >
> > It is a kernel thread as far as I can tell (based on
> > https://stackoverflow.com/a/12231039):
> >
> >     $ cd /proc/678
> >     $ cat cmdline
> >     $ readlink exe; echo $?
> >     1
> >
> > Removing the module mt7921e stops the thread, so shepherd does not wait for it.
>
> Ooooh.
>
> Then I’m afraid this bug isn’t fixed yet because that code (“waiting for
> process termination”) is currently in Guix, not in Shepherd.
>
> However, ‘processes’, which is what is used here and which is defined in
> (guix build syscalls), already checks for kernel threads, though it
> does it differently than what I implemented in shepherd:
>
>   (define (kernel? pid)
>     "Return #t if PID designates a \"kernel thread\" rather than a normal
>   user-land process."
>     (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid)
>                   (compose string-tokenize read-string))))
>       ;; See proc.txt in Linux's documentation for the list of fields.
>       (match stat
>         ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt
>               cmin_flt maj_flt cmaj_flt utime stime cutime cstime
>               priority nice num_thread it_real_value start_time
>               vsize rss rsslim
>               (= string->number start_code) (= string->number end_code) _ ...)
>          ;; Got this obscure trick from sysvinit's 'killall5' program.
>          (and (zero? start_code) (zero? end_code))))))
>
> It would be great if you could check whether this approach works for
> you.

Ah, that code indeed returns #f for the pid in question:

    scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
    $1 = #f

The stat file:

    $ cat /proc/688/stat
    688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0

So the start_code is not zero (I would guess it is -1).  I have no idea what
that means though.

Tomas



Date: Mon, 29 Jan 2024 17:31:33 +0100
From: Ludovic Courtès <ludo <at> gnu.org>
To: guix-devel <at> gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

Hi,

Tomas Volf <~@wolfsden.cz> skribis:

> Ah, that code indeed returns #f for the pid in question:
>
>     scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
>     $1 = #f
>
> The stat file:
>
>     $ cat /proc/688/stat
>     688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0
>
> So the start_code is not zero (I would guess it is -1).  I have no idea what
> that means though.

What about this method (from shepherd)?

--8<---------------cut here---------------start------------->8---
(define (linux-process-flags pid)
  "Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming
the Linux /proc file system is mounted; raise a @code{system-error} exception
otherwise."
  (call-with-input-file (string-append "/proc/" (number->string pid)
                                       "/stat")
    (lambda (port)
      (define line
        (get-string-all port))

      ;; Parse like systemd's 'is_kernel_thread' function.
      (let ((offset (string-index line #\))))     ;offset past 'tcomm' field
        (match (and offset
                    (string-tokenize (string-drop line (+ offset 1))))
          ((state ppid pgrp sid tty-nr tty-pgrp flags . _)
           (or (string->number flags) 0))
          (_
           0))))))

;; Per-process flag defined in <linux/sched.h>.
(define PF_KTHREAD #x00200000)                    ;I am a kernel thread

(define (linux-kernel-thread? pid)
  "Return true if @var{pid} is a Linux kernel thread."
  (= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD)))
--8<---------------cut here---------------end--------------->8---

If it works better, we can use that in syscalls.scm as well.

Ludo’.

PS: Having an entry in bug-guix would ensure we don’t lose track of this.


--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#68800; Package guix. (Mon, 29 Jan 2024 17:45:02 GMT) Full text and rfc822 format available.

Message #8 received at 68800 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: 68800 <at> debbugs.gnu.org
Subject: Re: Guix waits for termination of a kernel thread
Date: Mon, 29 Jan 2024 18:44:00 +0100
[Message part 1 (text/plain, inline)]
> Date: Mon, 29 Jan 2024 17:31:33 +0100
> From: Ludovic Courtès <ludo <at> gnu.org>
> To: guix-devel <at> gnu.org
> Subject: Re: GNU Shepherd 0.10.3 released
>
> Hi,
>
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > Ah, that code indeed returns #f for the pid in question:
> >
> >     scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
> >     $1 = #f
> >
> > The stat file:
> >
> >     $ cat /proc/688/stat
> >     688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0
> >
> > So the start_code is not zero (I would guess it is -1).  I have no idea what
> > that means though.
>
> What about this method (from shepherd)?
>
> --8<---------------cut here---------------start------------->8---
> (define (linux-process-flags pid)
>   "Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming
> the Linux /proc file system is mounted; raise a @code{system-error} exception
> otherwise."
>   (call-with-input-file (string-append "/proc/" (number->string pid)
>                                        "/stat")
>     (lambda (port)
>       (define line
>         (get-string-all port))
>
>       ;; Parse like systemd's 'is_kernel_thread' function.
>       (let ((offset (string-index line #\))))     ;offset past 'tcomm' field
>         (match (and offset
>                     (string-tokenize (string-drop line (+ offset 1))))
>           ((state ppid pgrp sid tty-nr tty-pgrp flags . _)
>            (or (string->number flags) 0))
>           (_
>            0))))))
>
> ;; Per-process flag defined in <linux/sched.h>.
> (define PF_KTHREAD #x00200000)                    ;I am a kernel thread
>
> (define (linux-kernel-thread? pid)
>   "Return true if @var{pid} is a Linux kernel thread."
>   (= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD)))
> --8<---------------cut here---------------end--------------->8---
>
> If it works better, we can use that in syscalls.scm as well.

Yes, that does seem to work:

    scheme@(guile-user)> ,use (ice-9 match)
    scheme@(guile-user)> ,use (ice-9 textual-ports)
    scheme@(guile-user)> (define (linux-process-flags pid)
      "Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming
    the Linux /proc file system is mounted; raise a @code{system-error} exception
    otherwise."
      (call-with-input-file (string-append "/proc/" (number->string pid)
                                           "/stat")
        (lambda (port)
          (define line
            (get-string-all port))

          ;; Parse like systemd's 'is_kernel_thread' function.
          (let ((offset (string-index line #\))))     ;offset past 'tcomm' field
            (match (and offset
                        (string-tokenize (string-drop line (+ offset 1))))
              ((state ppid pgrp sid tty-nr tty-pgrp flags . _)
               (or (string->number flags) 0))
              (_
               0))))))
    scheme@(guile-user)> ;; Per-process flag defined in <linux/sched.h>.
    (define PF_KTHREAD #x00200000)                    ;I am a kernel thread
    scheme@(guile-user)> (define (linux-kernel-thread? pid)
      "Return true if @var{pid} is a Linux kernel thread."
      (= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD)))
    scheme@(guile-user)> (linux-kernel-thread? 685)
    $5 = #t

Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 20 Feb 2024 09:39:01 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Tue, 20 Feb 2024 10:10:02 GMT) Full text and rfc822 format available.

Notification sent to Tomas Volf <~@wolfsden.cz>:
bug acknowledged by developer. (Tue, 20 Feb 2024 10:10:02 GMT) Full text and rfc822 format available.

Message #15 received at 68800-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 68800-done <at> debbugs.gnu.org
Subject: Re: bug#68800: Guix waits for termination of a kernel thread
Date: Tue, 20 Feb 2024 11:08:15 +0100
Hi Tomas,

Tomas Volf <~@wolfsden.cz> skribis:

>> What about this method (from shepherd)?
>>
>> --8<---------------cut here---------------start------------->8---
>> (define (linux-process-flags pid)
>>   "Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming
>> the Linux /proc file system is mounted; raise a @code{system-error} exception
>> otherwise."
>>   (call-with-input-file (string-append "/proc/" (number->string pid)
>>                                        "/stat")
>>     (lambda (port)
>>       (define line
>>         (get-string-all port))
>>
>>       ;; Parse like systemd's 'is_kernel_thread' function.
>>       (let ((offset (string-index line #\))))     ;offset past 'tcomm' field
>>         (match (and offset
>>                     (string-tokenize (string-drop line (+ offset 1))))
>>           ((state ppid pgrp sid tty-nr tty-pgrp flags . _)
>>            (or (string->number flags) 0))
>>           (_
>>            0))))))
>>
>> ;; Per-process flag defined in <linux/sched.h>.
>> (define PF_KTHREAD #x00200000)                    ;I am a kernel thread
>>
>> (define (linux-kernel-thread? pid)
>>   "Return true if @var{pid} is a Linux kernel thread."
>>   (= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD)))
>> --8<---------------cut here---------------end--------------->8---
>>
>> If it works better, we can use that in syscalls.scm as well.
>
> Yes, that does seem to work:

Pushed as 34c79c6ae8103ebae9ce08c81a9220a6b82b05f6.

Thank you!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 19 Mar 2024 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 129 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.