GNU bug report logs - #43344
"basic" system tests fail (and all the other ones) on guix master

Previous Next

Package: guix;

Reported by: Danny Milosavljevic <dannym <at> scratchpost.org>

Date: Fri, 11 Sep 2020 17:52:01 UTC

Severity: normal

Merged with 43352

Done: Leo Famulari <leo <at> famulari.name>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43344 in the body.
You can then email your comments to 43344 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Fri, 11 Sep 2020 17:52:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Danny Milosavljevic <dannym <at> scratchpost.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Fri, 11 Sep 2020 17:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: <bug-guix <at> gnu.org>
Subject: "basic" system tests fail (and all the other ones) on guix master
Date: Fri, 11 Sep 2020 19:50:58 +0200
[Message part 1 (text/plain, inline)]
Hi,

as of guix master commit 0fb974be9c3e1e22a2145c9c602c44cd10cef2b0 all system
tests, including "basic", fail:

$ guix environment --pure guix --ad-hoc git guile-readline guile-json nano guile-zlib guile-lzlib
(env)$ make TESTS=basic check-system
loading '/gnu/store/s3limrgxj4pd6b4psra66phary2nmqx4-linux-vm-loader'...
[    2.776050] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
environment variable `PATH' set to `/gnu/store/j3jlpncfqvykkq6sx7h4ly1rdcr2a8qq'
creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, .
Backtrace:
           3 (primitive-load "/gnu/store/liacrhaiab35pcfzdqsa71fvp7f?")
In ./gnu/build/vm.scm:
   470:21  2 (initialize-hard-disk "/dev/vda" #:bootloader-package _ ?)
    300:2  1 (initialize-partition-table "/dev/vda" (#<<partitio???@ build-log 1
?@ build-log 29023 1
?@ build-log 29023 1
>@ build-log 29023 1
 @ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
)@ build-log 29023 1
 @ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
)@ build-log 29023 1
@ build-log 29023 1

In ./guix/build/utils.scm:
    654:6  0 (invoke _ . _)

./guix/build/utils.scm:654:6: In procedure invoke:
ERROR:
  1. &invoke-error:
      program: "parted"
      arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)
      exit-status: 1
      term-signal: #f
      stop-signal: #f
[    3.446031] Unregister pv shared memory for cpu 0
[    3.447989] reboot: Restarting system
[    3.449167] reboot: machine restart
Backtrace: 
           1 (primitive-load "/gnu/store/1xp94hddh88szwkbqyb49jirmf5?")
In ./gnu/build/vm.scm:
   198:12  0 (load-in-linux-vm _ #:output _ #:qemu _ #:memory-size _ ?)

./gnu/build/vm.scm:198:12: In procedure load-in-linux-vm:
guest VM code exited with a non-zero status 256
note: keeping build directory `/tmp/guix-build-qemu-image.drv-0'
builder for `/gnu/store/ks5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-image.drv' failed1
build of /gnu/store/ks5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-image.drv failed
View build log at '/var/log/guix/drvs/ks/5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-im.
cannot build derivation `/gnu/store/8vhwchcxlns64nnqr13p0liw745s6pp8-run-vm.sh.t
cannot build derivation `/gnu/store/672vqgm8mx0a6k2iwsf13q73gmk5vdc4-basic.drv't
guix build: error: build of `/gnu/store/672vqgm8mx0a6k2iwsf13q73gmk5vdc4-basic.d
make: *** [Makefile:6007: check-system] Error 1
dannym <at> bayfront ~/src/guix-master/guix [env]$

The same happens on my laptop.

./pre-inst-env guix describe
Git checkout:
  repository: /home/dannym/src/guix-master/guix
  branch: master
  commit: 0fb974be9c3e1e22a2145c9c602c44cd10cef2b0

(ssh://dannym <at> git.sv.gnu.org/srv/git/guix.git)

I've tried to add a call to wipefs before parted--it doesn't help.

Before the problem surfaced I saw a qemu major update.
[Message part 2 (application/pgp-signature, inline)]

Merged 43344 43352. Request was from Oleg Pykhalov <go.wigust <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 12 Sep 2020 10:58:02 GMT) Full text and rfc822 format available.

Merged 43344 43352. Request was from Oleg Pykhalov <go.wigust <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 12 Sep 2020 11:02:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Mon, 14 Sep 2020 13:28:01 GMT) Full text and rfc822 format available.

Message #12 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: 43344 <at> debbugs.gnu.org
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Mon, 14 Sep 2020 15:26:52 +0200
Hello Danny,

>   1. &invoke-error:
>       program: "parted"
>       arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)

So it looks like the parted script failed inside the VM, while running
"initialize-partition-table".

This work fine both on the build farm and on my machine, which makes the
debug harder :(.

Anything special with your hardware? KVM support disabled maybe?

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Mon, 14 Sep 2020 16:32:02 GMT) Full text and rfc822 format available.

Message #15 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 43344 <at> debbugs.gnu.org
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Mon, 14 Sep 2020 18:31:10 +0200
[Message part 1 (text/plain, inline)]
Hi Mathieu,

On Mon, 14 Sep 2020 15:26:52 +0200
Mathieu Othacehe <othacehe <at> gnu.org> wrote:

> Anything special with your hardware? KVM support disabled maybe?

The culprit had been the Linux kernel update to 5.8.8.

After downgrading to 5.8.7 it works just fine--no other changes done.

Previously, I had tried also to add wipefs -a before the parted--that
hadn't helped either.
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Tue, 15 Sep 2020 22:37:01 GMT) Full text and rfc822 format available.

Message #18 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>,
 Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 43344 <at> debbugs.gnu.org, Leo Famulari <leo <at> famulari.name>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Tue, 15 Sep 2020 18:34:21 -0400
Hi,

Danny Milosavljevic <dannym <at> scratchpost.org> writes:

> On Mon, 14 Sep 2020 15:26:52 +0200
> Mathieu Othacehe <othacehe <at> gnu.org> wrote:
>
>> Anything special with your hardware? KVM support disabled maybe?
>
> The culprit had been the Linux kernel update to 5.8.8.
>
> After downgrading to 5.8.7 it works just fine--no other changes done.
>
> Previously, I had tried also to add wipefs -a before the parted--that
> hadn't helped either.

That's useful information, but we should not stay frozen on the 5.8.7
kernel for much longer.  5.8.8 contains many bug fixes, some of which
might fix potentially exploitable flaws.

It would be useful to know if this problem still occurs with 5.8.9,
which has since come out.  If so, we should do a bisection between 5.8.7
and 5.8.8 to find out which upstream commit introduced the problem.

Would anyone like to investigate this further?

        Mark




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Tue, 15 Sep 2020 23:07:01 GMT) Full text and rfc822 format available.

Message #21 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Danny Milosavljevic <dannym <at> scratchpost.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Tue, 15 Sep 2020 19:06:28 -0400
On Tue, Sep 15, 2020 at 06:34:21PM -0400, Mark H Weaver wrote:
> That's useful information, but we should not stay frozen on the 5.8.7
> kernel for much longer.  5.8.8 contains many bug fixes, some of which
> might fix potentially exploitable flaws.
> 
> It would be useful to know if this problem still occurs with 5.8.9,
> which has since come out.  If so, we should do a bisection between 5.8.7
> and 5.8.8 to find out which upstream commit introduced the problem.
> 
> Would anyone like to investigate this further?

I will try to reproduce the bug with 5.8.9 now. I will try the bisection
if time permits.




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Wed, 16 Sep 2020 14:00:02 GMT) Full text and rfc822 format available.

Message #24 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: 43344 <at> debbugs.gnu.org, Leo Famulari <leo <at> famulari.name>,
 Mark H Weaver <mhw <at> netris.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Wed, 16 Sep 2020 15:59:00 +0200
[Message part 1 (text/plain, inline)]
Hi,

Danny Milosavljevic <dannym <at> scratchpost.org> skribis:

> environment variable `PATH' set to `/gnu/store/j3jlpncfqvykkq6sx7h4ly1rdcr2a8qq'
> creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
> Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, .

[...]

>   1. &invoke-error:
>       program: "parted"
>       arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)
>       exit-status: 1
>       term-signal: #f
>       stop-signal: #f

The code in question in Parted:

--8<---------------cut here---------------start------------->8---
                if (!add_partition (disk, part)) {
                        ok[i - 1] = 0;
                        errnums[i - 1] = errno;
                }

[…]

        char *bad_part_list = NULL;
        /* now warn about any errors */
        for (i = 1; i <= lpn; i++) {
                if (ok[i - 1] || errnums[i - 1] == ENXIO)
                        continue;
                if (bad_part_list == NULL) {
                        bad_part_list = malloc (lpn * 5);
                        if (!bad_part_list)
                                goto cleanup;
                        bad_part_list[0] = 0;
                }
                sprintf (bad_part_list + strlen (bad_part_list), "%d, ", i);
        }
        if (bad_part_list == NULL)
                ret = 1;
        else {
                bad_part_list[strlen (bad_part_list) - 2] = 0;
                if (ped_exception_throw (
                        PED_EXCEPTION_ERROR,
                        PED_EXCEPTION_IGNORE_CANCEL,
                        _("Partition(s) %s on %s have been written, but we have "
                          "been unable to inform the kernel of the change, "
                          "probably because it/they are in use.  As a result, "
                          "the old partition(s) will remain in use.  You "
                          "should reboot now before making further changes."),
                        bad_part_list, disk->dev->path) == PED_EXCEPTION_IGNORE)
                        ret = 1;
                free (bad_part_list);
        }
--8<---------------cut here---------------end--------------->8---

With the patch below, I strace’d ‘parted’, which gives:

--8<---------------cut here---------------start------------->8---
$ make check-system TESTS=basic

[…]

ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=253, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=254, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=255, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=256, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
write(2, "Error", 5Error)                    = 5
write(2, ": ", 2: )                       = 2
write(2, "Partition(s) 1, 2, 3, 4, 5, 6, 7"..., 495Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on /dev/vda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.) = 495
write(2, "\n", 1
)                       = 1
--8<---------------cut here---------------end--------------->8---

So I threw more virtual RAM at it:

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
@@ -446,6 +450,7 @@ system that is passed to 'populate-root-file-system'."
                                      #:bootloader-installer
                                      #+(bootloader-installer bootloader)))))))
    #:system system
+   #:memory-size 1024
    #:make-disk-image? #t
    #:disk-image-size disk-image-size
    #:disk-image-format disk-image-format
[Message part 3 (text/plain, inline)]
… but that doesn’t help.

Ideas?

Thanks,
Ludo’.

[Message part 4 (text/x-patch, inline)]
diff --git a/gnu/build/vm.scm b/gnu/build/vm.scm
index 287d099f79..e793b5b518 100644
--- a/gnu/build/vm.scm
+++ b/gnu/build/vm.scm
@@ -297,7 +297,7 @@ actual /dev name based on DEVICE."
                                      partition-size)
                             partitions)
                        ", "))
-  (apply invoke "parted" "--script"
+  (apply invoke "strace" "parted" "--script"
          device "mklabel" label-type
          (options partitions offset))
 
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 59ffb334e0..72fb3ca49d 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -349,15 +349,15 @@ corresponding UPSTREAM-SOURCE (an origin), using the given DEBLOB-SCRIPTS."
 
 ;; The current "stable" kernel. That is, the most recently released major
 ;; version.
-(define-public linux-libre-5.8-version "5.8.7")
+(define-public linux-libre-5.8-version "5.8.8")
 (define deblob-scripts-5.8
   (linux-libre-deblob-scripts
-   linux-libre-5.8-version
+   "5.8.7"
    (base32 "07z7sglyrfh0706icqqf3shadf638pvyid9386r661ds5lbsa2mw")
    (base32 "0j6jba5fcddqlb42f95gjl78jisfla4nswqila074gglcrbnl9q7")))
 (define-public linux-libre-5.8-pristine-source
   (let ((version linux-libre-5.8-version)
-        (hash (base32 "1zhpzlhl2ykna2nc70m72wlgyv1pkvkpfssb4k8p5pwlkh1ga2vv")))
+        (hash (base32 "0xm901zvvrwsb9k88la6pb65nybi43bygiyz1z68njwsx6ripxik")))
    (make-linux-libre-source version
                             (%upstream-linux-source version hash)
                             deblob-scripts-5.8)))
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index 80a8618729..49489b6159 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -376,6 +376,10 @@ system that is passed to 'populate-root-file-system'."
 
              (set-path-environment-variable "PATH" '("bin" "sbin") inputs)
 
+             (setenv "PATH"
+                     (string-append #+(file-append strace "/bin") ":"
+                                    (getenv "PATH")))
+
              (let* ((graphs     '#$(match inputs
                                      (((names . _) ...)
                                       names)))

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Wed, 16 Sep 2020 14:13:01 GMT) Full text and rfc822 format available.

Message #27 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Danny Milosavljevic <dannym <at> scratchpost.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Wed, 16 Sep 2020 10:12:31 -0400
On Tue, Sep 15, 2020 at 07:06:28PM -0400, Leo Famulari wrote:
> I will try to reproduce the bug with 5.8.9 now. I will try the bisection
> if time permits.

It also fails with 5.8.9.




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Wed, 16 Sep 2020 14:54:01 GMT) Full text and rfc822 format available.

Message #30 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: Leo Famulari <leo <at> famulari.name>
Cc: Mark H Weaver <mhw <at> netris.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Wed, 16 Sep 2020 16:52:45 +0200
[Message part 1 (text/plain, inline)]
commit 692d0626557451c4b557397f20b7394b612d0289
Author: Christoph Hellwig <hch <at> lst.de>
Date:   Tue Sep 1 11:59:41 2020 +0200

    block: fix locking in bdev_del_partition
    
    [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
    
    We need to hold the whole device bd_mutex to protect against
    other thread concurrently deleting out partition before we get
    to it, and thus causing a use after free.
    
    Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
    Reported-by: syzbot+6448f3c229bc52b82f69 <at> syzkaller.appspotmail.com
    Signed-off-by: Christoph Hellwig <hch <at> lst.de>
    Signed-off-by: Jens Axboe <axboe <at> kernel.dk>
    Signed-off-by: Sasha Levin <sashal <at> kernel.org>

?
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Wed, 16 Sep 2020 15:03:02 GMT) Full text and rfc822 format available.

Message #33 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: Leo Famulari <leo <at> famulari.name>
Cc: Mark H Weaver <mhw <at> netris.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Wed, 16 Sep 2020 17:02:39 +0200
[Message part 1 (text/plain, inline)]
On Wed, 16 Sep 2020 16:52:45 +0200
Danny Milosavljevic <dannym <at> scratchpost.org> wrote:

> commit 692d0626557451c4b557397f20b7394b612d0289
> Author: Christoph Hellwig <hch <at> lst.de>
> Date:   Tue Sep 1 11:59:41 2020 +0200
> 
>     block: fix locking in bdev_del_partition
>     
>     [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>     
>     We need to hold the whole device bd_mutex to protect against
>     other thread concurrently deleting out partition before we get
>     to it, and thus causing a use after free.
>     
>     Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>     Reported-by: syzbot+6448f3c229bc52b82f69 <at> syzkaller.appspotmail.com
>     Signed-off-by: Christoph Hellwig <hch <at> lst.de>
>     Signed-off-by: Jens Axboe <axboe <at> kernel.dk>
>     Signed-off-by: Sasha Levin <sashal <at> kernel.org>
> 

int bdev_del_partition(struct block_device *bdev, int partno)
{
        struct block_device *bdevp;
        struct hd_struct *part = NULL;
        int ret;

        bdevp = bdget_disk(bdev->bd_disk, partno);
        if (!bdevp)
                return -ENOMEM; <--------------

...
}

struct block_device *bdget_disk(struct gendisk *disk, int partno)
{
        struct hd_struct *part;
        struct block_device *bdev = NULL;

        part = disk_get_part(disk, partno);
        if (part)
                bdev = bdget(part_devt(part));
        disk_put_part(part);

        return bdev;
}

struct block_device *bdget(dev_t dev)
{
        struct block_device *bdev;
        struct inode *inode;

        inode = iget5_locked(blockdev_superblock, hash(dev),
                        bdev_test, bdev_set, &dev);

        if (!inode)
                return NULL; <--------------------
[...]
}
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Wed, 16 Sep 2020 16:23:02 GMT) Full text and rfc822 format available.

Message #36 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: Mark H Weaver <mhw <at> netris.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Wed, 16 Sep 2020 12:22:44 -0400
[Message part 1 (text/plain, inline)]
On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
> commit 692d0626557451c4b557397f20b7394b612d0289
> Author: Christoph Hellwig <hch <at> lst.de>
> Date:   Tue Sep 1 11:59:41 2020 +0200
> 
>     block: fix locking in bdev_del_partition
>     
>     [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>     
>     We need to hold the whole device bd_mutex to protect against
>     other thread concurrently deleting out partition before we get
>     to it, and thus causing a use after free.
>     
>     Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>     Reported-by: syzbot+6448f3c229bc52b82f69 <at> syzkaller.appspotmail.com
>     Signed-off-by: Christoph Hellwig <hch <at> lst.de>
>     Signed-off-by: Jens Axboe <axboe <at> kernel.dk>
>     Signed-off-by: Sasha Levin <sashal <at> kernel.org>
> 
> ?

Do you think that's the faulty commit? I won't have time to test it
today.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 01:39:02 GMT) Full text and rfc822 format available.

Message #39 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Leo Famulari <leo <at> famulari.name>, Danny Milosavljevic
 <dannym <at> scratchpost.org>
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 43344 <at> debbugs.gnu.org
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Wed, 16 Sep 2020 21:36:49 -0400
Leo Famulari <leo <at> famulari.name> writes:

> On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
>> commit 692d0626557451c4b557397f20b7394b612d0289
>> Author: Christoph Hellwig <hch <at> lst.de>
>> Date:   Tue Sep 1 11:59:41 2020 +0200
>> 
>>     block: fix locking in bdev_del_partition
>>     
>>     [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>>     
>>     We need to hold the whole device bd_mutex to protect against
>>     other thread concurrently deleting out partition before we get
>>     to it, and thus causing a use after free.
>>     
>>     Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>>     Reported-by: syzbot+6448f3c229bc52b82f69 <at> syzkaller.appspotmail.com
>>     Signed-off-by: Christoph Hellwig <hch <at> lst.de>
>>     Signed-off-by: Jens Axboe <axboe <at> kernel.dk>
>>     Signed-off-by: Sasha Levin <sashal <at> kernel.org>
>> 
>> ?
>
> Do you think that's the faulty commit? I won't have time to test it
> today.

Looks like a good bet to me.  Many thanks to Leo for testing 5.8.9, and
to Danny for identifying the likely culprit.

Next, it probably makes sense to test 5.8.9 with the above commit
reverted.  Would someone like to try it?  If that works, we can avoid
the bisection, resume kernel updates, and revert this change across all
of our kernels until a better solution is found.

        Mark




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 02:22:02 GMT) Full text and rfc822 format available.

Message #42 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Danny Milosavljevic <dannym <at> scratchpost.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Wed, 16 Sep 2020 22:21:17 -0400
On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> Next, it probably makes sense to test 5.8.9 with the above commit
> reverted.  Would someone like to try it?  If that works, we can avoid
> the bisection, resume kernel updates, and revert this change across all
> of our kernels until a better solution is found.

I'll do this tonight.




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 08:41:01 GMT) Full text and rfc822 format available.

Message #45 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <marius <at> gnu.org>
To: Leo Famulari <leo <at> famulari.name>, Mark H Weaver <mhw <at> netris.org>
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 43344 <at> debbugs.gnu.org
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Thu, 17 Sep 2020 10:40:07 +0200
[Message part 1 (text/plain, inline)]
Leo Famulari <leo <at> famulari.name> writes:

> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
>> Next, it probably makes sense to test 5.8.9 with the above commit
>> reverted.  Would someone like to try it?  If that works, we can avoid
>> the bisection, resume kernel updates, and revert this change across all
>> of our kernels until a better solution is found.
>
> I'll do this tonight.

A fix for this is available in mainline:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a

(via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 14:06:01 GMT) Full text and rfc822 format available.

Message #48 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Danny Milosavljevic <dannym <at> scratchpost.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Thu, 17 Sep 2020 10:05:44 -0400
On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> > On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
> >> commit 692d0626557451c4b557397f20b7394b612d0289
> >> Author: Christoph Hellwig <hch <at> lst.de>
> >> Date:   Tue Sep 1 11:59:41 2020 +0200
> >> 
> >>     block: fix locking in bdev_del_partition

> Next, it probably makes sense to test 5.8.9 with the above commit
> reverted.  Would someone like to try it?  If that works, we can avoid
> the bisection, resume kernel updates, and revert this change across all
> of our kernels until a better solution is found.

Using linux-libre 5.8.9 with that commit reverted, `guix system vm` does
work again.




Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 14:21:01 GMT) Full text and rfc822 format available.

Message #51 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Marius Bakke <marius <at> gnu.org>
Cc: Mark H Weaver <mhw <at> netris.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones) on
 guix master
Date: Thu, 17 Sep 2020 10:20:26 -0400
[Message part 1 (text/plain, inline)]
On Thu, Sep 17, 2020 at 10:40:07AM +0200, Marius Bakke wrote:
> Leo Famulari <leo <at> famulari.name> writes:
> 
> > On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> >> Next, it probably makes sense to test 5.8.9 with the above commit
> >> reverted.  Would someone like to try it?  If that works, we can avoid
> >> the bisection, resume kernel updates, and revert this change across all
> >> of our kernels until a better solution is found.
> >
> > I'll do this tonight.
> 
> A fix for this is available in mainline:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a
> 
> (via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

What is the recommended way to report this to the stable maintainers?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 15:29:01 GMT) Full text and rfc822 format available.

Message #54 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: Leo Famulari <leo <at> famulari.name>
Cc: Mark H Weaver <mhw <at> netris.org>, 43344 <at> debbugs.gnu.org,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Thu, 17 Sep 2020 17:28:06 +0200
[Message part 1 (text/plain, inline)]
Hi Leo,

On Thu, 17 Sep 2020 10:05:44 -0400
Leo Famulari <leo <at> famulari.name> wrote:

> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> > > On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:  
> > >> commit 692d0626557451c4b557397f20b7394b612d0289
> > >> Author: Christoph Hellwig <hch <at> lst.de>
> > >> Date:   Tue Sep 1 11:59:41 2020 +0200
> > >> 
> > >>     block: fix locking in bdev_del_partition  
> 
> > Next, it probably makes sense to test 5.8.9 with the above commit
> > reverted.  Would someone like to try it?  If that works, we can avoid
> > the bisection, resume kernel updates, and revert this change across all
> > of our kernels until a better solution is found.  
> 
> Using linux-libre 5.8.9 with that commit reverted, `guix system vm` does
> work again.

Thanks for testing it!

What the commit does is lock the drive whose partition table is being
modified.  That sounds like a good idea in general.  But parted does not
expect the error code ENOMEM and thus misjudges the situation.

See also:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a

for the fix that goes into Linux 5.9.

We could also patch parted to also accept ENOMEM in addition to ENXIO if we
wanted to.  I wouldn't (it diverges from what upstream is doing--too much work).
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43344; Package guix. (Thu, 17 Sep 2020 19:11:01 GMT) Full text and rfc822 format available.

Message #57 received at 43344 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Marius Bakke <marius <at> gnu.org>, Leo Famulari <leo <at> famulari.name>
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 43344 <at> debbugs.gnu.org
Subject: Re: bug#43344: "basic" system tests fail (and all the other ones)
 on guix master
Date: Thu, 17 Sep 2020 15:08:58 -0400
Marius Bakke <marius <at> gnu.org> writes:

> Leo Famulari <leo <at> famulari.name> writes:
>
>> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
>>> Next, it probably makes sense to test 5.8.9 with the above commit
>>> reverted.  Would someone like to try it?  If that works, we can avoid
>>> the bisection, resume kernel updates, and revert this change across all
>>> of our kernels until a better solution is found.
>>
>> I'll do this tonight.
>
> A fix for this is available in mainline:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a
>
> (via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

Thank you, Marius!  I see that this patch has been included in the
recently released linux-libre-5.8.10.  Assuming that it fixes our
problem (which I suspect it does), we can simply update our kernels and
close this bug.

Leo, would you like to update our kernels?  I won't be able to get to it
today, and I'm looking to offload that job anyway.

      Mark




bug closed, send any further explanations to 43344 <at> debbugs.gnu.org and Danny Milosavljevic <dannym <at> scratchpost.org> Request was from Leo Famulari <leo <at> famulari.name> to control <at> debbugs.gnu.org. (Fri, 18 Sep 2020 13:42:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 17 Oct 2020 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 191 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.