GNU bug report logs - #31268
'guix system vm-image' hangs

Package: guix;

Reported by: George myglc2 Clemmer <myglc2 <at> gmail.com>

Date: Thu, 26 Apr 2018 04:09:02 UTC

Severity: important

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31268 in the body.
You can then email your comments to 31268 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Thu, 26 Apr 2018 04:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to George myglc2 Clemmer <myglc2 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 26 Apr 2018 04:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: George myglc2 Clemmer <myglc2 <at> gmail.com>
To: bug-guix <at> gnu.org
Cc: help-guix <help-guix <at> gnu.org>, Eric Bavier <ericbavier <at> centurylink.net>
Subject: Re: 'guix system vm-image' hangs
Date: Thu, 26 Apr 2018 00:07:46 -0400

[Message part 1 (text/plain, inline)]

On 04/26/2018 at 03:37 Eric Bavier writes:

> Hi George,
>
> I've seen the same lately with 'guix system vm':
>
> On Wed, 25 Apr 2018 21:39:30 -0400
> George myglc2 Clemmer <myglc2 <at> gmail.com> wrote:
>
>> 'guix system vm-image' hangs on guix (GNU Guix) 0.14.0.4439-eb88e built
>> from source today. Please see log except below. It worked on guix (GNU
>> Guix) 0.14.0.4196-2c92a. built <2018-04-18 Wed>
>>
>> TIA - George
>>
> [...]
>> environment variable `PATH' set to `/gnu/store/1n3cvnxrz1m6jqc09d9m1grbfbs91kmv-qemu-minimal-2.12.0/bin:/gnu/store/h8670f6j1rawhhaz8wvdy5gv79lny0am-parted-3.2/sbin:/gnu/store/nm305rpb2mvridkyj4l3636nc9ql4lf9-e2fsprogs-1.43.6/bin:/gnu/store/nm305rpb2mvridkyj4l3636nc9ql4lf9-e2fsprogs-1.43.6/sbin:/gnu/store/0sq2nflm42x0znkv44add0gk82khkcb6-dosfstools-4.1/sbin:/gnu/store/0hl513mnpkhszm2hjai2w9cxmpxs0vgq-sed-4.4/bin:/gnu/store/1h44pkgdd7n6s3i2vjh54awsvfmc219j-grep-3.1/bin:/gnu/store/kgzvfby2ggi1xawsh5vjh4s93qk2dp9k-coreutils-8.28/bin:/gnu/store/k7r2m2wgj8x8jjhccwjsiimp0dlzxb7i-findutils-4.6.0/bin:/gnu/store/ahxc89r6npzf2bbl8yg5vdjicskzzjf3-gawk-4.1.4/bin:/gnu/store/cwwvdxd9d778a2xp8vbb38vl62rx992g-guix-0.14.0-11.ab85cf7/bin:/gnu/store/cwwvdxd9d778a2xp8vbb38vl62rx992g-guix-0.14.0-11.ab85cf7/sbin'
>> creating partition table with 2 partitions (10190.0 MiB, 40.0 MiB)...
>> [    1.344115] tsc: Refined TSC clocksource calibration: 3392.348 MHz
>> [    1.344865] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e6121e742, max_idle_ns: 440795321217 ns
>> [    1.856062] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>   C-c C-c
>
> Same log output on my end.
>
>> g1 <at> g1 ~/src/vm/vms/sysi20$ guix --version
>> guix (GNU Guix) 0.14.0.4439-eb88e
>>
>
> I bisected the git repo to commit:
>
> ```
> commit ba39d3280759fb20a4ff0092a5afac721a5fa769
> Author: Mark H Weaver <mhw <at> netris.org>
> Date:   Tue Apr 24 16:51:45 2018 -0400
>
>     gnu: linux-libre: Update to 4.16.4.
>
>     * gnu/packages/linux.scm (%linux-libre-version): Update to 4.16.4.
>     (%linux-libre-hash): Update hash.
>
> ```
>
> Perhaps we should open a bug.
>
> `~Eric

Thank you Eric! Actually I meant to send it to bug-guix <at> gnu.org
before. Here is something that fails here.

guix system vm-image --image-size=10G sysi19.scm

[sysi19.scm (application/octet-stream, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Thu, 26 Apr 2018 14:10:01 GMT) Full text and rfc822 format available.

Message #8 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: George myglc2 Clemmer <myglc2 <at> gmail.com>
Cc: Mark H Weaver <mhw <at> netris.org>, Eric Bavier <ericbavier <at> centurylink.net>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Thu, 26 Apr 2018 16:09:52 +0200

Hello,

George myglc2 Clemmer <myglc2 <at> gmail.com> skribis:

> Thank you Eric! Actually I meant to send it to bug-guix <at> gnu.org
> before. Here is something that fails here.
>
> guix system vm-image --image-size=10G sysi19.scm

As of 37b9be5878d1694967a41d313de97f7a957df120 I see that ‘guix system
vm’ or “make check-system TESTS=basic” hang.  Specifically, the VM that
creates the image hangs like this:

--8<---------------cut here---------------start------------->8---
Welcome, this is GNU's early boot Guile.
Use '--repl' for an initrd REPL.

loading kernel modules...
[    0.618457] usbcore: registered new interface driver usb-storage
[    0.620812] usbcore: registered new interface driver uas
[    0.626976] hidraw: raw HID events driver (C) Jiri Kosina
[    0.628093] usbcore: registered new interface driver usbhid
[    0.628804] usbhid: USB HID core driver
[    0.635125] isci: Intel(R) C600 SAS Controller Driver - version 1.2.0
[    0.654740] PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.673721] PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.693204] PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.714857] PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.759651] virtio_blk virtio5: [vda] 143360 512-byte logical blocks (73.4 MB/70.0 MiB)
[    0.765620] FS-Cache: Loaded
[    0.767868] 9pnet: Installing 9P2000 support
[    0.768899] 9p: Installing v9fs 9p2000 file system support
[    0.769556] FS-Cache: Netfs '9p' registered for caching
configuring QEMU networking...
[    0.780494] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
loading '/gnu/store/d116816xkzfaa7y1d2srvbxy5ip5cm12-linux-vm-loader'...
[    0.802603] random: fast init done
environment variable `PATH' set to `/gnu/store/1n3cvnxrz1m6jqc09d9m1grbfbs91kmv-qemu-minimal-2.12.0/bin:/gnu/store/h8670f6j1rawhhaz8wvdy5gv79lny0am-parted-3.2/sbin:/gnu/store/nm305rpb2mvridkyj4l3636nc9ql4lf9-e2fsprogs-1.43.6/bin:/gnu/store/nm305rpb2mvridkyj4l3636nc9ql4lf9-e2fsprogs-1.43.6/sbin:/gnu/store/0sq2nflm42x0znkv44add0gk82khkcb6-dosfstools-4.1/sbin:/gnu/store/0hl513mnpkhszm2hjai2w9cxmpxs0vgq-sed-4.4/bin:/gnu/store/1h44pkgdd7n6s3i2vjh54awsvfmc219j-grep-3.1/bin:/gnu/store/kgzvfby2ggi1xawsh5vjh4s93qk2dp9k-coreutils-8.28/bin:/gnu/store/k7r2m2wgj8x8jjhccwjsiimp0dlzxb7i-findutils-4.6.0/bin:/gnu/store/ahxc89r6npzf2bbl8yg5vdjicskzzjf3-gawk-4.1.4/bin'
creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
[    1.344130] tsc: Refined TSC clocksource calibration: 2808.014 MHz
[    1.345186] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2879d389cc4, max_idle_ns: 440795324601 ns
[    1.792320] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
--8<---------------cut here---------------end--------------->8---

Reverting ba39d3280759fb20a4ff0092a5afac721a5fa769 (“gnu: linux-libre:
Update to 4.16.4.”) solves the problem for me.

Normally what we see is something like this:

--8<---------------cut here---------------start------------->8---
loading '/gnu/store/4i02v1lvw9nwzcfs5l1il9grhkw17iqm-linux-vm-loader'...
environment variable `PATH' set to `/gnu/store/1n3cvnxrz1m6jqc09d9m1grbfbs91kmv-qemu-minimal-2.12.0/bin:/gnu/store/h8670f6j1rawhhaz8wvdy5gv79lny0am-parted-3.2/sbin:/gnu/store/nm305rpb2mvridkyj4l3636nc9ql4lf9-e2fsprogs-1.43.6/bin:/gnu/store/nm305rpb2mvridkyj4l3636nc9ql4lf9-e2fsprogs-1.43.6/sbin:/gnu/store/0sq2nflm42x0znkv44add0gk82khkcb6-dosfstools-4.1/sbin:/gnu/store/0hl513mnpkhszm2hjai2w9cxmpxs0vgq-sed-4.4/bin:/gnu/store/1h44pkgdd7n6s3i2vjh54awsvfmc219j-grep-3.1/bin:/gnu/store/kgzvfby2ggi1xawsh5vjh4s93qk2dp9k-coreutils-8.28/bin:/gnu/store/k7r2m2wgj8x8jjhccwjsiimp0dlzxb7i-findutils-4.6.0/bin:/gnu/store/ahxc89r6npzf2bbl8yg5vdjicskzzjf3-gawk-4.1.4/bin'
creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
Warning: The resulting partition is not properly aligned for best performance.
creating ext4 partition...
mke2fs 1.43.6 (29-Aug-2017)
ext2fs_check_if_mount: Can't check if filesystem is mounted due to missing mtab file while determining whether /dev/vda1 is mounted.
Creating filesystem with 20480 1k blocks and 5136 inodes
Filesystem UUID: bb8da7fb-04b2-437e-887a-f468da978a40
Superblock backups stored on blocks: 
	8193

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

[    0.979292] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
…
--8<---------------cut here---------------end--------------->8---

So it seems that the VM image hangs when trying to run Parted.

Mark, are you aware of anything particular that could explain this?
Perhaps a regression/incompatible change in 9P or virtio?

Ludo’.

Severity set to 'important' from 'normal' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Sun, 29 Apr 2018 15:56:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Mon, 30 Apr 2018 08:25:01 GMT) Full text and rfc822 format available.

Message #13 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: George myglc2 Clemmer <myglc2 <at> gmail.com>
Cc: help-guix <help-guix <at> gnu.org>, Eric Bavier <ericbavier <at> centurylink.net>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Mon, 30 Apr 2018 04:23:02 -0400

George myglc2 Clemmer <myglc2 <at> gmail.com> writes:

> On 04/26/2018 at 03:37 Eric Bavier writes:
>
>> Hi George,
>>
>> I've seen the same lately with 'guix system vm':
>>
>> On Wed, 25 Apr 2018 21:39:30 -0400
>> George myglc2 Clemmer <myglc2 <at> gmail.com> wrote:
>>
>>> 'guix system vm-image' hangs on guix (GNU Guix) 0.14.0.4439-eb88e built
>>> from source today. Please see log except below. It worked on guix (GNU
>>> Guix) 0.14.0.4196-2c92a. built <2018-04-18 Wed>

[...]

>> I bisected the git repo to commit:
>>
>> ```
>> commit ba39d3280759fb20a4ff0092a5afac721a5fa769
>> Author: Mark H Weaver <mhw <at> netris.org>
>> Date:   Tue Apr 24 16:51:45 2018 -0400
>>
>>     gnu: linux-libre: Update to 4.16.4.
>>
>>     * gnu/packages/linux.scm (%linux-libre-version): Update to 4.16.4.
>>     (%linux-libre-hash): Update hash.

FYI, I just pushed commit 8c21c64e59d3f4d223d8aeef91f06fdde7de1ab7 which
updates linux-libre to 4.16.6.  Would you like to see if the problem
still persists?

       Mark

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Mon, 30 Apr 2018 14:19:01 GMT) Full text and rfc822 format available.

Message #16 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: George myglc2 Clemmer <myglc2 <at> gmail.com>, help-guix <help-guix <at> gnu.org>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Mon, 30 Apr 2018 16:18:20 +0200

[Message part 1 (text/plain, inline)]

Hello Mark,

Mark H Weaver <mhw <at> netris.org> skribis:

> FYI, I just pushed commit 8c21c64e59d3f4d223d8aeef91f06fdde7de1ab7 which
> updates linux-libre to 4.16.6.  Would you like to see if the problem
> still persists?

It does.

I instrumented the code to see what happens.  Opening /dev/vda works
fine, invoking “parted --version” works, so it’s really just ‘parted’
that hangs when asked to do the real thing (with 0% CPU consumption.)

However, if we strace it as with the patch below, it succeeds.  It
doesn’t seem to be a simple timing issue because adding (sleep 2) before
the parted invocation doesn’t make a difference.

To be continued…

Ludo’.

[Message part 2 (text/x-patch, inline)]

diff --git a/gnu/build/vm.scm b/gnu/build/vm.scm
index 527b4c495..b73c8f912 100644
--- a/gnu/build/vm.scm
+++ b/gnu/build/vm.scm
@@ -242,9 +242,21 @@ actual /dev name based on DEVICE."
                                      partition-size)
                             partitions)
                        ", "))
-  (apply invoke "parted" "--script"
+  (pk 'parted (which "parted"))
+  (pk 'parted2 (stat (which "parted")))
+  (pk 'opts device partitions offset)
+  (pk 'actual-options "parted" "--script"
          device "mklabel" label-type
          (options partitions offset))
+  (pk 'stat device (stat device))
+  (invoke "parted" "--version")
+  ;; (pk 'open-device device (open-fdes device O_WRONLY))
+  (pk 'parted)
+  (apply invoke "strace" "-s" "123"
+         "parted" "--script"
+         device "mklabel" label-type
+         (options partitions offset))
+  (pk 'parted-done)
 
   ;; Set the 'device' field of each partition.
   (reverse
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index 09a11af86..ba54a71dd 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -304,7 +304,7 @@ the image."
                       (ice-9 binary-ports))
 
          (let ((inputs
-                '#$(append (list qemu parted e2fsprogs dosfstools)
+                '#$(append (list strace qemu parted e2fsprogs dosfstools)
                            (map canonical-package
                                 (list sed grep coreutils findutils gawk))
                            (if register-closures? (list guix) '())))

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Mon, 30 Apr 2018 16:38:01 GMT) Full text and rfc822 format available.

Message #19 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: George myglc2 Clemmer <myglc2 <at> gmail.com>, 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Mon, 30 Apr 2018 12:36:03 -0400

ludo <at> gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw <at> netris.org> skribis:
>
>> FYI, I just pushed commit 8c21c64e59d3f4d223d8aeef91f06fdde7de1ab7 which
>> updates linux-libre to 4.16.6.  Would you like to see if the problem
>> still persists?
>
> It does.

I suggest doing a git bisect on the 4.16.y branch of the linux-stable
git repository, between the 4.16.3 and 4.16.4 tags.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/?h=linux-4.16.y

      Mark

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Fri, 18 May 2018 14:23:01 GMT) Full text and rfc822 format available.

Message #22 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Ludovic Courtès <ludo <at> gnu.org>,
 George myglc2 Clemmer <myglc2 <at> gmail.com>, 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Fri, 18 May 2018 16:22:05 +0200

Hi,

Mark H Weaver <mhw <at> netris.org> writes:

> ludo <at> gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw <at> netris.org> skribis:
>>
>>> FYI, I just pushed commit 8c21c64e59d3f4d223d8aeef91f06fdde7de1ab7 which
>>> updates linux-libre to 4.16.6.  Would you like to see if the problem
>>> still persists?
>>
>> It does.
>
> I suggest doing a git bisect on the 4.16.y branch of the linux-stable
> git repository, between the 4.16.3 and 4.16.4 tags.

I did it, here is the result:

--8<---------------cut here---------------start------------->8---
cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d is the first bad commit
commit cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d
Author: Theodore Ts'o <tytso <at> mit.edu>
Date:   Wed Apr 11 13:27:52 2018 -0400

    random: fix crng_ready() test
    
    commit 43838a23a05fbd13e47d750d3dfd77001536dd33 upstream.
    
    The crng_init variable has three states:
    
    0: The CRNG is not initialized at all
    1: The CRNG has a small amount of entropy, hopefully good enough for
       early-boot, non-cryptographical use cases
    2: The CRNG is fully initialized and we are sure it is safe for
       cryptographic use cases.
    
    The crng_ready() function should only return true once we are in the
    last state.  This addresses CVE-2018-1108.
    
    Reported-by: Jann Horn <jannh <at> google.com>
    Fixes: e192be9d9a30 ("random: replace non-blocking pool...")
    Cc: stable <at> kernel.org # 4.8+
    Signed-off-by: Theodore Ts'o <tytso <at> mit.edu>
    Reviewed-by: Jann Horn <jannh <at> google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh <at> linuxfoundation.org>

:040000 040000 4bdc62a4604b0a754614e8601ce794f316e83622 da682af7d9d779f193becb7757a8722a71e5d85d M      drivers
--8<---------------cut here---------------end--------------->8---

Clément

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Fri, 18 May 2018 16:40:02 GMT) Full text and rfc822 format available.

Message #25 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Clément Lassieur <clement <at> lassieur.org>
Cc: Mark H Weaver <mhw <at> netris.org>, George myglc2 Clemmer <myglc2 <at> gmail.com>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Fri, 18 May 2018 18:39:41 +0200

Hi!

Clément Lassieur <clement <at> lassieur.org> skribis:

> Mark H Weaver <mhw <at> netris.org> writes:

[...]

>> I suggest doing a git bisect on the 4.16.y branch of the linux-stable
>> git repository, between the 4.16.3 and 4.16.4 tags.
>
> I did it, here is the result:

Impressive, thanks a lot!  Did you have a script to do that or
something?

> cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d is the first bad commit
> commit cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d
> Author: Theodore Ts'o <tytso <at> mit.edu>
> Date:   Wed Apr 11 13:27:52 2018 -0400
>
>     random: fix crng_ready() test
>     
>     commit 43838a23a05fbd13e47d750d3dfd77001536dd33 upstream.
>     
>     The crng_init variable has three states:
>     
>     0: The CRNG is not initialized at all
>     1: The CRNG has a small amount of entropy, hopefully good enough for
>        early-boot, non-cryptographical use cases
>     2: The CRNG is fully initialized and we are sure it is safe for
>        cryptographic use cases.
>     
>     The crng_ready() function should only return true once we are in the
>     last state.  This addresses CVE-2018-1108.

What happens I think is that libparted/fs/r/fat/fat.c:fat_create calls
‘generate_random_uint32’, which in turn indirectly calls
‘get_random_bytes’ from libuuid (package ‘e2fsprogs’).  And I suppose
that’s where it blocks, even though it’s supposed to be using
/dev/urandom, which is not supposed to block.

To be continued…

Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Fri, 18 May 2018 22:04:01 GMT) Full text and rfc822 format available.

Message #28 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Mark H Weaver <mhw <at> netris.org>, George myglc2 Clemmer <myglc2 <at> gmail.com>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Sat, 19 May 2018 00:03:50 +0200

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi!
>
> Clément Lassieur <clement <at> lassieur.org> skribis:
>
>> Mark H Weaver <mhw <at> netris.org> writes:
>
> [...]
>
>>> I suggest doing a git bisect on the 4.16.y branch of the linux-stable
>>> git repository, between the 4.16.3 and 4.16.4 tags.
>>
>> I did it, here is the result:
>
> Impressive, thanks a lot!  Did you have a script to do that or
> something?

You're welcome!  I just used the "git bisect" command on a local git
repository that was pointed to by linux-libre <origin>.  Automating it
with "git bisect run" wasn't worth it because there were only 8 tests to
do (log2(196)).

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Fri, 18 May 2018 22:15:02 GMT) Full text and rfc822 format available.

Message #31 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Clément Lassieur <clement <at> lassieur.org>
Cc: Mark H Weaver <mhw <at> netris.org>, George myglc2 Clemmer <myglc2 <at> gmail.com>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Sat, 19 May 2018 00:13:58 +0200

Clément Lassieur <clement <at> lassieur.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Hi!
>>
>> Clément Lassieur <clement <at> lassieur.org> skribis:
>>
>>> Mark H Weaver <mhw <at> netris.org> writes:
>>
>> [...]
>>
>>>> I suggest doing a git bisect on the 4.16.y branch of the linux-stable
>>>> git repository, between the 4.16.3 and 4.16.4 tags.
>>>
>>> I did it, here is the result:
>>
>> Impressive, thanks a lot!  Did you have a script to do that or
>> something?
>
> You're welcome!  I just used the "git bisect" command on a local git
> repository that was pointed to by linux-libre <origin>.  Automating it
> with "git bisect run" wasn't worth it because there were only 8 tests to
> do (log2(196)).

Nice, I was expecting it to require more tests.  Thanks again!

Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Tue, 22 May 2018 16:25:02 GMT) Full text and rfc822 format available.

Message #34 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Clément Lassieur <clement <at> lassieur.org>
Cc: Mark H Weaver <mhw <at> netris.org>, George myglc2 Clemmer <myglc2 <at> gmail.com>,
 Leo Famulari <leo <at> famulari.name>, 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Tue, 22 May 2018 18:23:35 +0200

[Message part 1 (text/plain, inline)]

Hello!

(+Cc: Leo for randomness input.)

ludo <at> gnu.org (Ludovic Courtès) skribis:

> Clément Lassieur <clement <at> lassieur.org> skribis:
>
>> Mark H Weaver <mhw <at> netris.org> writes:
>
> [...]
>
>>> I suggest doing a git bisect on the 4.16.y branch of the linux-stable
>>> git repository, between the 4.16.3 and 4.16.4 tags.
>>
>> I did it, here is the result:
>
> Impressive, thanks a lot!  Did you have a script to do that or
> something?
>
>> cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d is the first bad commit
>> commit cd8d7a5778a4abf76ee8fe8f1bfcf78976029f8d
>> Author: Theodore Ts'o <tytso <at> mit.edu>
>> Date:   Wed Apr 11 13:27:52 2018 -0400
>>
>>     random: fix crng_ready() test
>>     
>>     commit 43838a23a05fbd13e47d750d3dfd77001536dd33 upstream.
>>     
>>     The crng_init variable has three states:
>>     
>>     0: The CRNG is not initialized at all
>>     1: The CRNG has a small amount of entropy, hopefully good enough for
>>        early-boot, non-cryptographical use cases
>>     2: The CRNG is fully initialized and we are sure it is safe for
>>        cryptographic use cases.
>>     
>>     The crng_ready() function should only return true once we are in the
>>     last state.  This addresses CVE-2018-1108.
>
> What happens I think is that libparted/fs/r/fat/fat.c:fat_create calls
> ‘generate_random_uint32’, which in turn indirectly calls
> ‘get_random_bytes’ from libuuid (package ‘e2fsprogs’).  And I suppose
> that’s where it blocks, even though it’s supposed to be using
> /dev/urandom, which is not supposed to block.

I was looking at the wrong code: we’re using libuuid from util-linux,
which in turn uses getrandom(2).  Since it doesn’t pass the
GRND_NONBLOCK flag, it ends up blocking forever because too little
entropy is available in the VM.

The following patches work around that:

  1. Parted now explicitly uses getrandom(2) with GRND_NONBLOCK instead
     of libuuid’s ‘uuid_generate’, which is good enough for this
     purpose.  I’ll submit it upstream.

  2. e2fsprogs is changed to use a libuuid that passes GRND_NONBLOCK.
     It does the job, but it’s quite inelegant.

Another approach I looked at was to seed the VM’s PRNG from /dev/hwrng,
which I thought was connected to the host via ‘virtio-rng-pci’, but I
get ENODEV while trying to read from /dev/hwrng in the guest like this:

[Message part 2 (text/x-patch, inline)]

diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index b6561dfc7..3bfd6b4ca 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -319,6 +319,18 @@ the image."
 
            (set-path-environment-variable "PATH" '("bin" "sbin") inputs)
 
+           ;; Try writing from /dev/hwrng into /dev/urandom.
+           ;; It seems that the file /dev/hwrng always exists, even
+           ;; when there is no hardware random number generator
+           ;; available. So, we handle a failed read or any other error
+           ;; reported by the operating system.
+           (let ((buf (call-with-input-file "/dev/hwrng"
+                        (lambda (hwrng)
+                          (get-bytevector-n hwrng 512)))))
+             (call-with-output-file "/dev/urandom"
+               (lambda (urandom)
+                 (put-bytevector urandom buf))))
+
            (let* ((graphs     '#$(match inputs
                                    (((names . _) ...)
                                     names)))

[Message part 3 (text/plain, inline)]

Am I missing something, Leo?

In short, we’re almost done with this bug, now we just need to make sure
we have a reasonable fix.

Thanks,
Ludo’.

[0003-gnu-e2fsprogs-Use-libuuid-with-non-blocking-random.patch (text/x-patch, attachment)]

[0001-gnu-parted-Use-weak-non-blocking-randomness-for-FAT-.patch (text/x-patch, attachment)]

[0002-vm-Use-a-deterministic-file-system-UUID-in-shared-st.patch (text/x-patch, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Tue, 22 May 2018 18:46:02 GMT) Full text and rfc822 format available.

Message #37 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: Efraim Flashner <efraim <at> flashner.co.il>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: George myglc2 Clemmer <myglc2 <at> gmail.com>,
 Clément Lassieur <clement <at> lassieur.org>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Tue, 22 May 2018 21:45:52 +0300

[Message part 1 (text/plain, inline)]

On Tue, May 22, 2018 at 06:23:35PM +0200, Ludovic Courtès wrote:
> Hello!
> 
> (+Cc: Leo for randomness input.)

<snip>

> 
> Another approach I looked at was to seed the VM’s PRNG from /dev/hwrng,
> which I thought was connected to the host via ‘virtio-rng-pci’, but I
> get ENODEV while trying to read from /dev/hwrng in the guest like this:
> 

> diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
> index b6561dfc7..3bfd6b4ca 100644
> --- a/gnu/system/vm.scm
> +++ b/gnu/system/vm.scm
> @@ -319,6 +319,18 @@ the image."
>  
>             (set-path-environment-variable "PATH" '("bin" "sbin") inputs)
>  
> +           ;; Try writing from /dev/hwrng into /dev/urandom.
> +           ;; It seems that the file /dev/hwrng always exists, even
> +           ;; when there is no hardware random number generator
> +           ;; available. So, we handle a failed read or any other error
> +           ;; reported by the operating system.
> +           (let ((buf (call-with-input-file "/dev/hwrng"
> +                        (lambda (hwrng)
> +                          (get-bytevector-n hwrng 512)))))
> +             (call-with-output-file "/dev/urandom"
> +               (lambda (urandom)
> +                 (put-bytevector urandom buf))))
> +
>             (let* ((graphs     '#$(match inputs
>                                     (((names . _) ...)
>                                      names)))

> 
> Am I missing something, Leo?
> 
> In short, we’re almost done with this bug, now we just need to make sure
> we have a reasonable fix.
> 
> Thanks,
> Ludo’.
> 

I can confirm that on the aarch64 machines I've tested there is a
/dev/hwrng.

Another option for entropy is haveged. http://issihosts.com/haveged/
It's not very actively maintained, but it is used by other distros for
randomness on headless machines.

-- 
Efraim Flashner   <efraim <at> flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Tue, 22 May 2018 22:58:01 GMT) Full text and rfc822 format available.

Message #40 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: George myglc2 Clemmer <myglc2 <at> gmail.com>, Leo Famulari <leo <at> famulari.name>,
 Clément Lassieur <clement <at> lassieur.org>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Tue, 22 May 2018 18:55:50 -0400

Hi Ludovic,

ludo <at> gnu.org (Ludovic Courtès) writes:

> I was looking at the wrong code: we’re using libuuid from util-linux,
> which in turn uses getrandom(2).  Since it doesn’t pass the
> GRND_NONBLOCK flag, it ends up blocking forever because too little
> entropy is available in the VM.
>
> The following patches work around that:
>
>   1. Parted now explicitly uses getrandom(2) with GRND_NONBLOCK instead
>      of libuuid’s ‘uuid_generate’, which is good enough for this
>      purpose.  I’ll submit it upstream.
>
>   2. e2fsprogs is changed to use a libuuid that passes GRND_NONBLOCK.
>      It does the job, but it’s quite inelegant.

I'm uncomfortable with this approach.  I think that it's proper for uuid
generation to block if there's not enough entropy.

> Another approach I looked at was to seed the VM’s PRNG from /dev/hwrng,
> which I thought was connected to the host via ‘virtio-rng-pci’, but I
> get ENODEV while trying to read from /dev/hwrng in the guest like this:

I much prefer this approach, where we seed the VM's PRNG from another
source.

> From 8aa37a4124db90a9991485477d1af85677c7fa1b Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo <at> gnu.org>
> Date: Tue, 22 May 2018 17:36:35 +0200
> Subject: [PATCH 2/3] vm: Use a deterministic file system UUID in shared-store
>  VMs.
> 
> * gnu/system/vm.scm (system-qemu-image/shared-store): Pass
>  #:file-system-uuid to 'qemu-image'.
> ---
>  gnu/system/vm.scm | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
> index 09a11af86..b6561dfc7 100644
> --- a/gnu/system/vm.scm
> +++ b/gnu/system/vm.scm
> @@ -752,6 +752,13 @@ with the host.
>  When FULL-BOOT? is true, return an image that does a complete boot sequence,
>  bootloaded included; thus, make a disk image that contains everything the
>  bootloader refers to: OS kernel, initrd, bootloader data, etc."
> +  (define root-uuid
> +    ;; Use a fixed UUID so that 'mke2fs' doesn't wait for strong randomness
> +    ;; while generating a UUID.  See <https://bugs.gnu.org/31268>.
> +    ;; XXX: Actually it doesn't help since 'mke2fs' calls 'uuid_generate'
> +    ;; anyway.

How about fixing mke2fs so that if you specify a fixed UUID, it won't
try to generate a random one?  Would that be sufficient?

      Thanks,
        Mark

Information forwarded to bug-guix <at> gnu.org:
bug#31268; Package guix. (Wed, 23 May 2018 07:29:01 GMT) Full text and rfc822 format available.

Message #43 received at 31268 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: George myglc2 Clemmer <myglc2 <at> gmail.com>, Leo Famulari <leo <at> famulari.name>,
 Clément Lassieur <clement <at> lassieur.org>,
 31268 <at> debbugs.gnu.org
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Wed, 23 May 2018 09:27:54 +0200

Hi Mark,

Mark H Weaver <mhw <at> netris.org> skribis:

> ludo <at> gnu.org (Ludovic Courtès) writes:
>
>> I was looking at the wrong code: we’re using libuuid from util-linux,
>> which in turn uses getrandom(2).  Since it doesn’t pass the
>> GRND_NONBLOCK flag, it ends up blocking forever because too little
>> entropy is available in the VM.
>>
>> The following patches work around that:
>>
>>   1. Parted now explicitly uses getrandom(2) with GRND_NONBLOCK instead
>>      of libuuid’s ‘uuid_generate’, which is good enough for this
>>      purpose.  I’ll submit it upstream.
>>
>>   2. e2fsprogs is changed to use a libuuid that passes GRND_NONBLOCK.
>>      It does the job, but it’s quite inelegant.
>
> I'm uncomfortable with this approach.  I think that it's proper for uuid
> generation to block if there's not enough entropy.

Yeah, I think #1 is OK because it’s “just” FAT serial numbers, but #2
could be more problematic.

We could keep these package variants private though, and use them only
for the purposes of ‘qemu-image’ if we don’t find a better solution.

>> From 8aa37a4124db90a9991485477d1af85677c7fa1b Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo <at> gnu.org>
>> Date: Tue, 22 May 2018 17:36:35 +0200
>> Subject: [PATCH 2/3] vm: Use a deterministic file system UUID in shared-store
>>  VMs.
>> 
>> * gnu/system/vm.scm (system-qemu-image/shared-store): Pass
>>  #:file-system-uuid to 'qemu-image'.
>> ---
>>  gnu/system/vm.scm | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
>> index 09a11af86..b6561dfc7 100644
>> --- a/gnu/system/vm.scm
>> +++ b/gnu/system/vm.scm
>> @@ -752,6 +752,13 @@ with the host.
>>  When FULL-BOOT? is true, return an image that does a complete boot sequence,
>>  bootloaded included; thus, make a disk image that contains everything the
>>  bootloader refers to: OS kernel, initrd, bootloader data, etc."
>> +  (define root-uuid
>> +    ;; Use a fixed UUID so that 'mke2fs' doesn't wait for strong randomness
>> +    ;; while generating a UUID.  See <https://bugs.gnu.org/31268>.
>> +    ;; XXX: Actually it doesn't help since 'mke2fs' calls 'uuid_generate'
>> +    ;; anyway.
>
> How about fixing mke2fs so that if you specify a fixed UUID, it won't
> try to generate a random one?  Would that be sufficient?

mke2fs generates two UUIDs:

--8<---------------cut here---------------start------------->8---
	if (fs_uuid) { … }
        else
		uuid_generate(fs->super->s_uuid);

        [...]

	uuid_generate((unsigned char *) fs->super->s_hash_seed);
--8<---------------cut here---------------end--------------->8---

The first ‘uuid_generate’ call is omitted when we pass “-U UUID” but the
second one isn’t.  I don’t think we can get around it.

Thanks,
Ludo’.

Reply sent to ludo <at> gnu.org (Ludovic Courtès):
You have taken responsibility. (Wed, 23 May 2018 08:24:02 GMT) Full text and rfc822 format available.

Notification sent to George myglc2 Clemmer <myglc2 <at> gmail.com>:
bug acknowledged by developer. (Wed, 23 May 2018 08:24:02 GMT) Full text and rfc822 format available.

Message #48 received at 31268-done <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: 31268-done <at> debbugs.gnu.org, George myglc2 Clemmer <myglc2 <at> gmail.com>,
 Clément Lassieur <clement <at> lassieur.org>,
 Leo Famulari <leo <at> famulari.name>
Subject: Re: bug#31268: 'guix system vm-image' hangs
Date: Wed, 23 May 2018 10:23:28 +0200

Bah, it turns out virtio-rng was simply not enabled, so commit
d282388789a21ae031035e9fd8e94a72463d0c96 fixes that.

Done!

Ludo'.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 20 Jun 2018 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 185 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #31268 'guix system vm-image' hangs

GNU bug report logs - #31268
'guix system vm-image' hangs