GNU bug report logs - #43565
cuirass: Fibers scheduling blocked.

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guix; Reported by: Mathieu Othacehe <othacehe@HIDDEN>; Done: Mathieu Othacehe <othacehe@HIDDEN>; Maintainer for guix is bug-guix@HIDDEN.

Message received at 43565-done <at> debbugs.gnu.org:


Received: (at 43565-done) by debbugs.gnu.org; 27 Oct 2020 18:03:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 27 14:03:45 2020
Received: from localhost ([127.0.0.1]:45741 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kXTJx-0003GS-1A
	for submit <at> debbugs.gnu.org; Tue, 27 Oct 2020 14:03:45 -0400
Received: from eggs.gnu.org ([209.51.188.92]:52532)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1kXTJv-0003GF-Ij
 for 43565-done <at> debbugs.gnu.org; Tue, 27 Oct 2020 14:03:43 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:55426)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1kXTJp-000821-Ve; Tue, 27 Oct 2020 14:03:37 -0400
Received: from [2a01:e0a:19b:d9a0:50c1:4a69:300e:4531] (port=40508 helo=cervin)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <othacehe@HIDDEN>)
 id 1kXTJp-00058a-I1; Tue, 27 Oct 2020 14:03:37 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#43565: cuirass: Fibers scheduling blocked.
References: <87eemtzr1q.fsf@HIDDEN> <87r1qc27mo.fsf@HIDDEN>
 <874kmmzd92.fsf@HIDDEN> <871rhpqgjy.fsf@HIDDEN>
 <87v9excbj8.fsf@HIDDEN> <87tuuh9cxe.fsf@HIDDEN>
Date: Tue, 27 Oct 2020 19:03:35 +0100
In-Reply-To: <87tuuh9cxe.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Mon, 26 Oct 2020 17:20:29 +0100")
Message-ID: <87sg9zh7go.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 43565-done
Cc: 43565-done <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

> Yeah please go ahead if you want, or let me know if you=E2=80=99d rather =
let me
> apply it.

I applied your patch, thanks! I'm closing this one, because there's
nothing much that can be done right now.

Thanks,

Mathieu




Notification sent to Mathieu Othacehe <othacehe@HIDDEN>:
bug acknowledged by developer. Full text available.
Reply sent to Mathieu Othacehe <othacehe@HIDDEN>:
You have taken responsibility. Full text available.

Message received at 43565 <at> debbugs.gnu.org:


Received: (at 43565) by debbugs.gnu.org; 26 Oct 2020 16:20:48 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 26 12:20:48 2020
Received: from localhost ([127.0.0.1]:41023 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kX5El-00031h-Qa
	for submit <at> debbugs.gnu.org; Mon, 26 Oct 2020 12:20:48 -0400
Received: from eggs.gnu.org ([209.51.188.92]:60334)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1kX5Ej-00031T-Mp
 for 43565 <at> debbugs.gnu.org; Mon, 26 Oct 2020 12:20:46 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:57841)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>) id 1kX5Ee-0000GP-CG
 for 43565 <at> debbugs.gnu.org; Mon, 26 Oct 2020 12:20:40 -0400
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=33550 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1kX5EV-0006xg-3B; Mon, 26 Oct 2020 12:20:37 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#43565: cuirass: Fibers scheduling blocked.
References: <87eemtzr1q.fsf@HIDDEN> <87r1qc27mo.fsf@HIDDEN>
 <874kmmzd92.fsf@HIDDEN> <871rhpqgjy.fsf@HIDDEN>
 <87v9excbj8.fsf@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 5 Brumaire an 229 de la =?utf-8?Q?R=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Mon, 26 Oct 2020 17:20:29 +0100
In-Reply-To: <87v9excbj8.fsf@HIDDEN> (Mathieu Othacehe's message of "Mon, 26
 Oct 2020 15:22:19 +0100")
Message-ID: <87tuuh9cxe.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 43565
Cc: 43565 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hello!

Mathieu Othacehe <othacehe@HIDDEN> skribis:

>> But does Cuirass create file descriptors as O_NONBLOCK?  This has to be
>> done explicitly, Fibers won=E2=80=99t do it for us.  As it turns out, th=
e answer
>> is no, in at least one important case: the connection to the daemon
>> (untested patch below).
>>
>> While GC is running, Cuirass typically sends =E2=80=98build-derivations=
=E2=80=99 RPCs
>> and they block until the GC lock is released.  That can lead to the
>> situation above: a bunch of threads blocked in =E2=80=98read=E2=80=99 fr=
om their daemon
>> socket, waiting for the RPC reply.  OTOH, =E2=80=98build-derivations=E2=
=80=99 RPCs are
>> made from a fresh thread created by =E2=80=98build-derivations&=E2=80=99.
>
> While I agree not opening file descriptors with O_NONBLOCK is an issue,
> build-derivations is called in a separate thread. Blocking this separate
> thread should not block the fibers.

Agreed.

> Now the question is why there's no fetching while the GC is running? The
> answer is that "latest-repository-commit" called by "fetch-input" will
> block the only fiber dedicated to fetching. Having multiple fibers
> trying to fetch wouldn't solve anything because fetching requires some
> building from the daemon.

Exactly: when the GC lock is taken, =E2=80=98latest-repository-commit=E2=80=
=99 makes an
=E2=80=98add-to-store=E2=80=99 RPC, and that RPC blocks.  Thus the whole fe=
tch fiber is
blocked.

The patch should address this case.  That said, nothing useful happens
anyway when the GC lock is held, so it wouldn=E2=80=99t have any practical
effect.

I believe there are other cases where RPCs can be slow, for example when
there=E2=80=99s contention on the sqlite database.  Perhaps that could help=
 a
bit there although again, it=E2=80=99s a situation where nothing useful can
happen.

> Long story short, I think we can apply your patch that can be useful to
> prevent fibers talking directly to the daemon to block, even though it
> won't help for this particular hang, that will only be fixed the GC time
> will be reduced to something more acceptable.

Yeah please go ahead if you want, or let me know if you=E2=80=99d rather le=
t me
apply it.

Thanks!

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#43565; Package guix. Full text available.

Message received at 43565 <at> debbugs.gnu.org:


Received: (at 43565) by debbugs.gnu.org; 26 Oct 2020 14:22:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 26 10:22:30 2020
Received: from localhost ([127.0.0.1]:40695 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kX3OH-00041t-TN
	for submit <at> debbugs.gnu.org; Mon, 26 Oct 2020 10:22:30 -0400
Received: from eggs.gnu.org ([209.51.188.92]:48384)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1kX3OF-00041h-T9
 for 43565 <at> debbugs.gnu.org; Mon, 26 Oct 2020 10:22:28 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:55404)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1kX3OA-0006fR-K1; Mon, 26 Oct 2020 10:22:22 -0400
Received: from [2a01:e0a:19b:d9a0:50c1:4a69:300e:4531] (port=54364 helo=cervin)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <othacehe@HIDDEN>)
 id 1kX3O9-0000yp-FK; Mon, 26 Oct 2020 10:22:21 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#43565: cuirass: Fibers scheduling blocked.
References: <87eemtzr1q.fsf@HIDDEN> <87r1qc27mo.fsf@HIDDEN>
 <874kmmzd92.fsf@HIDDEN> <871rhpqgjy.fsf@HIDDEN>
Date: Mon, 26 Oct 2020 15:22:19 +0100
In-Reply-To: <871rhpqgjy.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Fri, 23 Oct 2020 14:21:05 +0200")
Message-ID: <87v9excbj8.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 43565
Cc: 43565 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey!

Many thanks for your help, you rock!

> But does Cuirass create file descriptors as O_NONBLOCK?  This has to be
> done explicitly, Fibers won=E2=80=99t do it for us.  As it turns out, the=
 answer
> is no, in at least one important case: the connection to the daemon
> (untested patch below).
>
> While GC is running, Cuirass typically sends =E2=80=98build-derivations=
=E2=80=99 RPCs
> and they block until the GC lock is released.  That can lead to the
> situation above: a bunch of threads blocked in =E2=80=98read=E2=80=99 fro=
m their daemon
> socket, waiting for the RPC reply.  OTOH, =E2=80=98build-derivations=E2=
=80=99 RPCs are
> made from a fresh thread created by =E2=80=98build-derivations&=E2=80=99.

While I agree not opening file descriptors with O_NONBLOCK is an issue,
build-derivations is called in a separate thread. Blocking this separate
thread should not block the fibers.

For instance, the following program:

--8<---------------cut here---------------start------------->8---
(use-modules (fibers)
             (ice-9 threads))

(run-fibers
 (lambda ()
   (spawn-fiber
    (lambda ()
      (call-with-new-thread
       (lambda ()
         (read (car (pipe)))))))
   (spawn-fiber
    (lambda ()
      (while #t
        (format #t "alive~%")
        (sleep 1)))))
 #:hz 10
 #:drain? #t)
--8<---------------cut here---------------end--------------->8---

keeps displaying "alive" even if the spawned thread is blocking. I guess
that's also what's happening in Cuirass because the log shows that some
fibers are scheduled while the GC is running.

Now the question is why there's no fetching while the GC is running? The
answer is that "latest-repository-commit" called by "fetch-input" will
block the only fiber dedicated to fetching. Having multiple fibers
trying to fetch wouldn't solve anything because fetching requires some
building from the daemon.

Long story short, I think we can apply your patch that can be useful to
prevent fibers talking directly to the daemon to block, even though it
won't help for this particular hang, that will only be fixed the GC time
will be reduced to something more acceptable.

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#43565; Package guix. Full text available.

Message received at 43565 <at> debbugs.gnu.org:


Received: (at 43565) by debbugs.gnu.org; 23 Oct 2020 12:21:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Oct 23 08:21:18 2020
Received: from localhost ([127.0.0.1]:56258 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kVw4M-0000HG-FI
	for submit <at> debbugs.gnu.org; Fri, 23 Oct 2020 08:21:18 -0400
Received: from eggs.gnu.org ([209.51.188.92]:37490)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1kVw4J-0000Gy-1q
 for 43565 <at> debbugs.gnu.org; Fri, 23 Oct 2020 08:21:17 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:48171)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>) id 1kVw4D-0001Ke-PV
 for 43565 <at> debbugs.gnu.org; Fri, 23 Oct 2020 08:21:09 -0400
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=55132 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1kVw4B-0003Jj-Vh; Fri, 23 Oct 2020 08:21:08 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#43565: cuirass: Fibers scheduling blocked.
References: <87eemtzr1q.fsf@HIDDEN> <87r1qc27mo.fsf@HIDDEN>
 <874kmmzd92.fsf@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 2 Brumaire an 229 de la =?utf-8?Q?R=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Fri, 23 Oct 2020 14:21:05 +0200
In-Reply-To: <874kmmzd92.fsf@HIDDEN> (Mathieu Othacehe's message of "Thu, 22
 Oct 2020 13:55:21 +0200")
Message-ID: <871rhpqgjy.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 43565
Cc: 43565 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Good afternoon fearless hacker!

Mathieu Othacehe <othacehe@HIDDEN> skribis:

>> =E2=80=98process-build-log=E2=80=99 in Cuirass uses =E2=80=98read-line/n=
on-blocking=E2=80=99 to read a
>> line from the log port of =E2=80=98build-derivations&=E2=80=99.  If that=
 really is
>> non-blocking (and I think it is), then we should be fine?
>>
>> We should attach GDB to Cuirass next time to see what=E2=80=99s blocking.
>
> Cuirass is currently hanging probably due to the same issue. I saved a
> GDB core dump in /home/mathieu/core.76483.

For those following along at home, we have 60 threads in there.

A couple of threads are blocked in =E2=80=98clock_nanosleep=E2=80=99, which=
 I considered
fishy at first:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007fe26752f7a1 in __GI___clock_nanosleep (clock_id=3D-612010, flags=
=3D0, req=3D0x7fdf6b40d140, rem=3D0x7fdf6b40d140)
    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
#1  0x00007fe267a0166d in ffi_call_unix64 ()
   from /gnu/store/bw15z9kh9c65ycc2vbhl2izwfwfva7p1-libffi-3.3/lib/libffi.s=
o.7
#2  0x00007fe2679ffac0 in ffi_call_int () from /gnu/store/bw15z9kh9c65ycc2v=
bhl2izwfwfva7p1-libffi-3.3/lib/libffi.so.7
#3  0x00007fe267af5f2e in scm_i_foreign_call (cif_scm=3D<optimized out>, po=
inter_scm=3D<optimized out>,=20
    errno_ret=3Derrno_ret@entry=3D0x7fe25a8e86cc, argv=3D0x7fe25b955df0) at=
 foreign.c:1073
#4  0x00007fe267b64a84 in foreign_call (thread=3D0x7fe26741e480, cif=3D<opt=
imized out>, pointer=3D<optimized out>)
    at vm.c:1282
#5  0x00007fe2505253e0 in ?? ()
#6  0x00007fe26741e480 in ?? ()
#7  0x00007fe267bd7620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bg=
fcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#8  0x00007fe26741e480 in ?? ()
#9  0x00007fe267b1043b in scm_jit_enter_mcode (thread=3D0x7fe26741e480, thr=
ead@entry=3D0x7fe2505253b0,=20
    mcode=3D0x7fe25052627c "L\215\243\210") at jit.c:5852
#10 0x00007fe267b6bc24 in vm_regular_engine (thread=3D0x7fe2505253b0) at vm=
-engine.c:415
#11 0x00007fe267b6c5b5 in scm_call_n (proc=3Dproc@entry=3D#<unmatched-tag 2=
0045>, argv=3Dargv@entry=3D0x0,=20
    nargs=3Dnargs@entry=3D0) at vm.c:1608
#12 0x00007fe267ae8ae9 in scm_call_0 (proc=3Dproc@entry=3D#<unmatched-tag 2=
0045>) at eval.c:490
#13 0x00007fe267adb138 in scm_call_with_unblocked_asyncs (proc=3D#<unmatche=
d-tag 20045>) at async.c:406
--8<---------------cut here---------------end--------------->8---

This can only come from (fibers posix-clocks) via
=E2=80=98with-interrupts=E2=80=99=E2=80=94probably OK.

Then there=E2=80=99s a couple of threads block in =E2=80=98pthread_cond_wai=
t=E2=80=99, but
that=E2=80=99s presumably also Fibers internals.

Then there=E2=80=99s a whole bunch of threads stuck in =E2=80=98read=E2=80=
=99:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007fe267a180a4 in __libc_read (fd=3D80, buf=3Dbuf@entry=3D0x7fe22b0=
bb8f0, nbytes=3Dnbytes@entry=3D8)
    at ../sysdeps/unix/sysv/linux/read.c:26
#1  0x00007fe267af69c7 in fport_read (port=3D<optimized out>, dst=3D<optimi=
zed out>, start=3D<optimized out>, count=3D8)
    at fports.c:597
#2  0x00007fe267b30542 in trampoline_to_c_read (port=3D#<port #<port-type f=
ile 7fe25fb4db40> 7fe22b7b9880>,=20
    dst=3D"#<vu8vector>" =3D {...}, start=3D0, count=3D8) at ports.c:266
#3  0x00007fe2580cb5fe in ?? ()
#4  0x00007fe267431d80 in ?? ()
#5  0x00007fe267bd7620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bg=
fcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#6  0x00007fe267431d80 in ?? ()
#7  0x00007fe267b1043b in scm_jit_enter_mcode (thread=3D0x7fe267431d80, thr=
ead@entry=3D0x7fe2580cb5d0,=20
    mcode=3D0x7fe229340690 "H\203\350(I\211\314I)\304I\203\374\060\017\205T=
\003") at jit.c:5852
#8  0x00007fe267b6b8e9 in vm_regular_engine (thread=3D0x7fe2580cb5d0) at vm=
-engine.c:360
#9  0x00007fe267b6c5b5 in scm_call_n (proc=3Dproc@entry=3D#<unmatched-tag 2=
0045>, argv=3Dargv@entry=3D0x0,=20
    nargs=3Dnargs@entry=3D0) at vm.c:1608
#10 0x00007fe267ae8ae9 in scm_call_0 (proc=3Dproc@entry=3D#<unmatched-tag 2=
0045>) at eval.c:490
#11 0x00007fe267adb138 in scm_call_with_unblocked_asyncs (proc=3D#<unmatche=
d-tag 20045>) at async.c:406
--8<---------------cut here---------------end--------------->8---

=E2=80=98trampoline_to_c_read=E2=80=99 is known as =E2=80=98port-read=E2=80=
=99 in Scheme, so I think the
call above comes from =E2=80=98read-bytes=E2=80=99 in (ice-9 suspendable-po=
rts).

Normally, this file descriptor is O_NONBLOCK, and thus =E2=80=98fport_read=
=E2=80=99
immediately returns EAGAIN, so =E2=80=98trampoline_to_c_read=E2=80=99 retur=
ns #false.

But does Cuirass create file descriptors as O_NONBLOCK?  This has to be
done explicitly, Fibers won=E2=80=99t do it for us.  As it turns out, the a=
nswer
is no, in at least one important case: the connection to the daemon
(untested patch below).

While GC is running, Cuirass typically sends =E2=80=98build-derivations=E2=
=80=99 RPCs
and they block until the GC lock is released.  That can lead to the
situation above: a bunch of threads blocked in =E2=80=98read=E2=80=99 from =
their daemon
socket, waiting for the RPC reply.  OTOH, =E2=80=98build-derivations=E2=80=
=99 RPCs are
made from a fresh thread created by =E2=80=98build-derivations&=E2=80=99.

There are probably other situations where the daemon replies slowly.
For instance, =E2=80=98fetch-input=E2=80=99 can remain stuck until GC is ov=
er.

WDYT?

Thanks for investigating!

Ludo=E2=80=99.


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline

diff --git a/src/cuirass/base.scm b/src/cuirass/base.scm
index 5a0c826..6db43c4 100644
--- a/src/cuirass/base.scm
+++ b/src/cuirass/base.scm
@@ -36,6 +36,9 @@
   #:use-module ((guix config) #:select (%state-directory))
   #:use-module (git)
   #:use-module (ice-9 binary-ports)
+  #:use-module ((ice-9 suspendable-ports)
+                #:select (current-read-waiter
+                          current-write-waiter))
   #:use-module (ice-9 format)
   #:use-module (ice-9 match)
   #:use-module (ice-9 popen)
@@ -79,7 +82,12 @@
   ;; currently closes in a 'dynamic-wind' handler, which means it would close
   ;; the store at each context switch.  Remove this when the real 'with-store'
   ;; has been fixed.
-  (let ((store (open-connection)))
+  (let* ((store  (open-connection))
+         (socket (store-connection-socket store)))
+    ;; Mark SOCKET as non-blocking so Fibers can schedule the way it wants.
+    (let ((flags (fcntl socket F_GETFL)))
+      (fcntl socket F_SETFL (logior O_NONBLOCK flags)))
+
     (unwind-protect
      ;; Always set #:keep-going? so we don't stop on the first build failure.
      ;; Set #:print-build-trace explicitly to make sure 'process-build-log'
@@ -422,7 +430,12 @@ Essentially this procedure inverts the inversion-of-control that
           (lambda ()
             (guard (c ((store-error? c)
                        (atomic-box-set! result c)))
-              (parameterize ((current-build-output-port output))
+              (parameterize ((current-build-output-port output)
+
+                             ;; STORE's socket is O_NONBLOCK but since we're
+                             ;; not in a fiber, disable Fiber's handlers.
+                             (current-read-waiter #f)
+                             (current-write-waiter #f))
                 (let ((x (build-derivations store lst)))
                   (atomic-box-set! result x))))
             (close-port output))

--=-=-=--




Information forwarded to bug-guix@HIDDEN:
bug#43565; Package guix. Full text available.

Message received at 43565 <at> debbugs.gnu.org:


Received: (at 43565) by debbugs.gnu.org; 22 Oct 2020 11:55:36 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 22 07:55:36 2020
Received: from localhost ([127.0.0.1]:51628 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kVZBv-0007Tq-OS
	for submit <at> debbugs.gnu.org; Thu, 22 Oct 2020 07:55:36 -0400
Received: from eggs.gnu.org ([209.51.188.92]:35496)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1kVZBu-0007Tc-Q2
 for 43565 <at> debbugs.gnu.org; Thu, 22 Oct 2020 07:55:35 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:41807)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1kVZBm-0008OH-4F; Thu, 22 Oct 2020 07:55:28 -0400
Received: from [2a01:e0a:19b:d9a0:50c1:4a69:300e:4531] (port=44984 helo=cervin)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <othacehe@HIDDEN>)
 id 1kVZBi-0005mD-L9; Thu, 22 Oct 2020 07:55:23 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#43565: cuirass: Fibers scheduling blocked.
References: <87eemtzr1q.fsf@HIDDEN> <87r1qc27mo.fsf@HIDDEN>
Date: Thu, 22 Oct 2020 13:55:21 +0200
In-Reply-To: <87r1qc27mo.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Mon, 05 Oct 2020 14:13:19 +0200")
Message-ID: <874kmmzd92.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 43565
Cc: 43565 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey Ludo!

> =E2=80=98process-build-log=E2=80=99 in Cuirass uses =E2=80=98read-line/no=
n-blocking=E2=80=99 to read a
> line from the log port of =E2=80=98build-derivations&=E2=80=99.  If that =
really is
> non-blocking (and I think it is), then we should be fine?
>
> We should attach GDB to Cuirass next time to see what=E2=80=99s blocking.

Cuirass is currently hanging probably due to the same issue. I saved a
GDB core dump in /home/mathieu/core.76483.

Could use your help finding the guilty thread :)

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#43565; Package guix. Full text available.

Message received at 43565 <at> debbugs.gnu.org:


Received: (at 43565) by debbugs.gnu.org; 5 Oct 2020 12:13:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 05 08:13:29 2020
Received: from localhost ([127.0.0.1]:48191 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kPPMv-0008Ns-9f
	for submit <at> debbugs.gnu.org; Mon, 05 Oct 2020 08:13:29 -0400
Received: from eggs.gnu.org ([209.51.188.92]:51556)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1kPPMt-0008Ng-Fm
 for 43565 <at> debbugs.gnu.org; Mon, 05 Oct 2020 08:13:28 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:57493)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>) id 1kPPMo-0003hU-8I
 for 43565 <at> debbugs.gnu.org; Mon, 05 Oct 2020 08:13:22 -0400
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=35036 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1kPPMn-0007CA-5P; Mon, 05 Oct 2020 08:13:21 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#43565: cuirass: Fibers scheduling blocked.
References: <87eemtzr1q.fsf@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 14 =?utf-8?Q?Vend=C3=A9miaire?= an 229 de la
 =?utf-8?Q?R=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Mon, 05 Oct 2020 14:13:19 +0200
In-Reply-To: <87eemtzr1q.fsf@HIDDEN> (Mathieu Othacehe's message of "Tue, 22
 Sep 2020 18:58:57 +0200")
Message-ID: <87r1qc27mo.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 43565
Cc: 43565 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Mathieu Othacehe <othacehe@HIDDEN> skribis:

> Today between 04:04 and 10:36 no inputs were fetched. Fetching is
> supposed to happen every 5 minutes. This seem to be correlated to the
> duration of the garbage collection happening on berlin.
>
> 2020-09-22T04:04:23 fetching input 'core-updates' of spec 'core-updates-c=
ore-updates'
> 2020-09-22T04:04:25 build succeeded: '/gnu/store/c7m6jxdkyjs7m5ynavagjwgp=
172a3xzv-partition.img.drv'
> waiting for the big garbage collector lock...
> ...
> 2020-09-22T10:36:02 fetching input 'guix' of spec 'guix-master'
>
> A potential cause is described here:
> https://issues.guix.gnu.org/43552#1.

=E2=80=98process-build-log=E2=80=99 in Cuirass uses =E2=80=98read-line/non-=
blocking=E2=80=99 to read a
line from the log port of =E2=80=98build-derivations&=E2=80=99.  If that re=
ally is
non-blocking (and I think it is), then we should be fine?

We should attach GDB to Cuirass next time to see what=E2=80=99s blocking.

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#43565; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 22 Sep 2020 16:59:05 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Sep 22 12:59:05 2020
Received: from localhost ([127.0.0.1]:33208 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kKldB-0006A9-7c
	for submit <at> debbugs.gnu.org; Tue, 22 Sep 2020 12:59:05 -0400
Received: from lists.gnu.org ([209.51.188.17]:47154)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1kKldA-0006A2-BH
 for submit <at> debbugs.gnu.org; Tue, 22 Sep 2020 12:59:04 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:33150)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>) id 1kKld8-0005PN-NF
 for bug-guix@HIDDEN; Tue, 22 Sep 2020 12:59:04 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:53822)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>) id 1kKld8-0006Mz-EK
 for bug-guix@HIDDEN; Tue, 22 Sep 2020 12:59:02 -0400
Received: from [2a01:e0a:19b:d9a0:9d9d:97cc:d92a:8ac0] (port=46448 helo=cervin)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <othacehe@HIDDEN>) id 1kKld5-0002Cm-4Q
 for bug-guix@HIDDEN; Tue, 22 Sep 2020 12:59:00 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: bug-guix@HIDDEN 
Subject: cuirass: Fibers scheduling blocked.
Date: Tue, 22 Sep 2020 18:58:57 +0200
Message-ID: <87eemtzr1q.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hello,

Today between 04:04 and 10:36 no inputs were fetched. Fetching is
supposed to happen every 5 minutes. This seem to be correlated to the
duration of the garbage collection happening on berlin.

--8<---------------cut here---------------start------------->8---
2020-09-22T04:04:23 fetching input 'core-updates' of spec 'core-updates-core-updates'
2020-09-22T04:04:25 build succeeded: '/gnu/store/c7m6jxdkyjs7m5ynavagjwgp172a3xzv-partition.img.drv'
waiting for the big garbage collector lock...
...
2020-09-22T10:36:02 fetching input 'guix' of spec 'guix-master'
--8<---------------cut here---------------end--------------->8---

A potential cause is described here:
https://issues.guix.gnu.org/43552#1.

Thanks,

Mathieu

-- 
https://othacehe.org




Acknowledgement sent to Mathieu Othacehe <othacehe@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-guix@HIDDEN. Full text available.
Report forwarded to bug-guix@HIDDEN:
bug#43565; Package guix. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Tue, 27 Oct 2020 18:15:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.