GNU bug report logs - #61839
'guix offload' occasionally hangs on berlin, hitting max-silent-time

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guix; Severity: important; Reported by: Ludovic Courtès <ludo@HIDDEN>; dated Mon, 27 Feb 2023 10:38:02 UTC; Maintainer for guix is bug-guix@HIDDEN.

Message received at 61839 <at> debbugs.gnu.org:


Received: (at 61839) by debbugs.gnu.org; 17 Apr 2023 22:55:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Apr 17 18:55:35 2023
Received: from localhost ([127.0.0.1]:57154 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1poXl1-0008RT-9Q
	for submit <at> debbugs.gnu.org; Mon, 17 Apr 2023 18:55:35 -0400
Received: from eggs.gnu.org ([209.51.188.92]:59892)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1poXky-0008RG-Oq
 for 61839 <at> debbugs.gnu.org; Mon, 17 Apr 2023 18:55:33 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1poXks-00028Y-Mo; Mon, 17 Apr 2023 18:55:26 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=8/7uXQWdXdN9WS7JoSCh8JZZYXXLVe40koqpM5e5VME=; b=aTPpXzp+RWfa7BNSly2K
 +yUSDOjIyFQQZ/9OpPb1q2hTNEOnMzcG0rPmBHsJ+e66snk1NTQbsZEpEMiw9gCXhbt4PVsK7xSS9
 /zY2ZkoVZYE87cAsjsV5E94TqGFq97hCeTuZWA65Q6RtaiiOehKX+IthU6vCsXxo9gqsHasrHvLPT
 Cn8bsqjVLfuQy4GS2p/8nsBK7BpuqqDrn9Olp73aFW6/Ov0890SYmMETfFQ2+qs9dPOBsJSGHHx8j
 NiCMnWV3A5VsFCNNqvfjOj2KOUDCCG4KJFSJIOJ8BmfENmsx+x7lKq+MP3Q6Y0uK8CUzbxp2iO4d0
 3fNjr8ApSklBrg==;
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1poXks-0006Hn-9q; Mon, 17 Apr 2023 18:55:26 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: 61839 <at> debbugs.gnu.org
Subject: Re: bug#61839: 'guix offload' occasionally hangs on berlin, hitting
 max-silent-time
References: <87a60zo0le.fsf@HIDDEN> <87edoicvn3.fsf@HIDDEN>
Date: Tue, 18 Apr 2023 00:55:22 +0200
In-Reply-To: <87edoicvn3.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Mon, 17 Apr 2023 18:37:36 +0200")
Message-ID: <871qkiazl1.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 61839
Cc: Mathieu Othacehe <othacehe@HIDDEN>, Andreas Enge <andreas@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Ludovic Court=C3=A8s <ludo@HIDDEN> skribis:

> Turns out that guix-daemon on build machines is running with =E2=80=98--d=
ebug=E2=80=99
> starting with maintenance.git commit
> 72d8a62466b71eda657b52ca17e99f7189ea18ad (=E2=80=9Csysadmin: build-machin=
es:
> Enable guix-daemon debug output.=E2=80=9D)
>
> The immediate fix is to undo that=E2=80=94I=E2=80=99ll do that later.

Done:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=3Dcbe75=
edae348bf0aa235e93ebeb58fe9db14fa7a

I redeployed the build machines (on the same Guix commit as before).
The 4 machines that host a childhurd are still being redeployed though.

> The other one is to swallow =E2=80=98current-error-port=E2=80=99 in
> =E2=80=98store-import-channel=E2=80=99.

Done:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3D166a3e3fdeaa5279b=
2dd5000088f913c05af9558

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#61839; Package guix. Full text available.

Message received at 61839 <at> debbugs.gnu.org:


Received: (at 61839) by debbugs.gnu.org; 17 Apr 2023 16:37:47 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Apr 17 12:37:47 2023
Received: from localhost ([127.0.0.1]:55923 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1poRrO-0003Qz-U0
	for submit <at> debbugs.gnu.org; Mon, 17 Apr 2023 12:37:47 -0400
Received: from eggs.gnu.org ([209.51.188.92]:43922)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1poRrN-0003Qk-5C
 for 61839 <at> debbugs.gnu.org; Mon, 17 Apr 2023 12:37:46 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1poRrH-0000k7-2a; Mon, 17 Apr 2023 12:37:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=aJyv1vdXysdiIG2f34skQEptpu3vNlbj/J4KbAfaBwY=; b=b/0jMb6NNKHoJE9X2ERk
 +QTDkQDLBE6LQRE/nk+bywrxJioUKx8Cj9zkd6ncBbxf1bEVpLE/bV2Jnb6qxqvSJFNCtpQYRaEyT
 aI29M70gLVyLr+Vh079J1xVWsJP8jSkaG6SqJ5Lg5CH3h14/SzrbqzHlkNLLe9YiGXN9PkU0GsMCp
 r0NEPeXNFPAp3/sWrhpe95pSkUBN6uxg/Sx+n/z7h0iJOQccO9A3wNrWAJNddYQ9ai8a7PUUZvUvX
 T+fNBDNHQdqE4KeedO+prLDj2AiVArlDwMYHxhoeNK8lH9WxshDcgTDhNb8QDJ9tVnJ2twJwLjI8c
 YgZMmaJmtcMd5w==;
Received: from [193.50.110.118] (helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1poRrG-0006xR-Kf; Mon, 17 Apr 2023 12:37:38 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: 61839 <at> debbugs.gnu.org
Subject: Re: bug#61839: 'guix offload' occasionally hangs on berlin, hitting
 max-silent-time
References: <87a60zo0le.fsf@HIDDEN>
Date: Mon, 17 Apr 2023 18:37:36 +0200
In-Reply-To: <87a60zo0le.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22?=
 =?utf-8?Q?'s?= message of "Mon, 27 Feb 2023 11:37:49 +0100")
Message-ID: <87edoicvn3.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 61839
Cc: Mathieu Othacehe <othacehe@HIDDEN>, Andreas Enge <andreas@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi!

Ludovic Court=C3=A8s <ludo@HIDDEN> skribis:

> We often get evaluation errors for =E2=80=98core-updates=E2=80=99, where =
the build log
> reveals that this is due to a build timeout while building the Guix
> instance that will be used for the evaluation.  Example:

[...]

> exporting path `/gnu/store/m2hyx7qk2j1bsvpj1qv89701klv8kqkk-guix-command'
> exporting path `/gnu/store/vlm7401d5jvna6j7qdxhcr581cd8g8vh-guix-daemon-1=
.4.0-4.01fd830'
> exporting path `/gnu/store/zv3c0anfj085d2lyxh4g4aahhczmcaj8-guix-daemon'
> exporting path `/gnu/store/f26rw4haizf2h126k9ja9wk31cm49mj0-guix-17bd0243=
3'
> exporting path `/gnu/store/wc736va72zl2zz36817ahf3kc5fkfr69-inferior-scri=
pt.scm'
> building of `/gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv' timed=
 out after 3600 seconds of silence

With help from Andreas, I was able to reproduce it on berlin by running:

  guix build /gnu/store/b22580wj63wk4a3i09n22wa60y36kaa5-openjdk-16.0.2.drv

Timeout happens while =E2=80=98guix offload=E2=80=99 is sending build requi=
sites to the
target machine.  At that point, the client =E2=80=98guix offload=E2=80=99 p=
rocess is
stuck like so:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007f1fd7ad7d6f in __GI___poll (fds=3D0x766b00, nfds=3D1, timeout=3D=
3594784) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f1fcc220de7 in ssh_poll_ctx_dopoll () from target:/gnu/store/f3h=
n75p2jjc94jfxfwsgf96s2wz8kaql-libssh-0.9.6/lib/libssh.so.4
#2  0x00007f1fcc221f0c in ssh_handle_packets () from target:/gnu/store/f3hn=
75p2jjc94jfxfwsgf96s2wz8kaql-libssh-0.9.6/lib/libssh.so.4
#3  0x00007f1fcc221fdb in ssh_handle_packets_termination ()
   from target:/gnu/store/f3hn75p2jjc94jfxfwsgf96s2wz8kaql-libssh-0.9.6/lib=
/libssh.so.4
#4  0x00007f1fcc2034bb in channel_write_common () from target:/gnu/store/f3=
hn75p2jjc94jfxfwsgf96s2wz8kaql-libssh-0.9.6/lib/libssh.so.4
#5  0x00007f1fcc28b15b in ?? () from target:/gnu/store/rjhq6cz7xbck4l7zar6g=
a3ygph2chwzr-guile-ssh-0.16.0/lib/libguile-ssh.so
#6  0x00007f1fd7cf4f34 in scm_i_write_bytes (port=3D0x7f1fc556ba00, src=3D0=
x7f1fc3c27000, start=3D0, count=3D65536) at ports.c:2858
#7  0x00007f1fd7cff546 in scm_put_bytevector (port=3D0x7f1fc556ba00, bv=3D0=
x7f1fc3c27000, start=3D<optimized out>, count=3D<optimized out>)
    at r6rs-ports.c:676
#8  0x00007f1fc5ac5197 in ?? ()
#9  0x000000000069731c in ?? ()
#10 0x00007f1fc5acdf71 in ?? ()
#11 0x0000000000697318 in ?? ()
#12 0x00007f1fd7cd2ccc in scm_jit_enter_mcode (thread=3D0x7f1fd73b1d80, mco=
de=3D0x69bf14 "\034;\004") at jit.c:6038
#13 0x00007f1fd7d2874f in vm_regular_engine (thread=3D0x7f1fd73b1d80) at vm=
-engine.c:546
#14 0x00007f1fd7d355d9 in scm_call_n (proc=3D<optimized out>, argv=3D<optim=
ized out>, nargs=3D1) at vm.c:1610
#15 0x00007f1fd7c9d457 in scm_primitive_eval (exp=3D<optimized out>, exp@en=
try=3D0x7f1fcfab8cf0) at eval.c:671
#16 0x00007f1fd7ca34b6 in scm_eval (exp=3D0x7f1fcfab8cf0, module_or_state=
=3D0x7f1fcfaacc80) at eval.c:705
#17 0x00007f1fd7d073b6 in scm_shell (argc=3D11, argv=3D0x6a9fc0) at script.=
c:357
#18 0x00007f1fd7cb249c in invoke_main_func (body_data=3D0x7fffd2c0d370) at =
init.c:312
#19 0x00007f1fd7c9b85a in c_body (d=3D0x7fffd2c0d290) at continuations.c:430
#20 0x00007f1fd7d28326 in vm_regular_engine (thread=3D0x7f1fd73b1d80) at vm=
-engine.c:972
#21 0x00007f1fd7d355d9 in scm_call_n (proc=3D<optimized out>, argv=3D<optim=
ized out>, nargs=3D2) at vm.c:1610
#22 0x00007f1fd7c9d09a in scm_call_2 (proc=3D<optimized out>, arg1=3D<optim=
ized out>, arg2=3D<optimized out>) at eval.c:503
#23 0x00007f1fd7d53742 in scm_c_with_exception_handler.constprop.0 (type=3D=
0x404, handler_data=3Dhandler_data@entry=3D0x7fffd2c0d220,
    thunk_data=3Dthunk_data@entry=3D0x7fffd2c0d220, thunk=3D<optimized out>=
, handler=3D<optimized out>) at exceptions.c:170
#24 0x00007f1fd7d2588f in scm_c_catch (tag=3D<optimized out>, body=3D<optim=
ized out>, body_data=3D<optimized out>, handler=3D<optimized out>,
    handler_data=3D<optimized out>, pre_unwind_handler=3D<optimized out>, p=
re_unwind_handler_data=3D0x7f1fcfa10040) at throw.c:168
#25 0x00007f1fd7c9de66 in scm_i_with_continuation_barrier (pre_unwind_handl=
er=3D0x7f1fd7c9db80 <pre_unwind_handler>,
    pre_unwind_handler_data=3D0x7f1fcfa10040, handler_data=3D0x7fffd2c0d290=
, handler=3D0x7f1fd7ca48b0 <c_handler>, body_data=3D0x7fffd2c0d290,
    body=3D0x7f1fd7c9b850 <c_body>) at continuations.c:368
#26 scm_c_with_continuation_barrier (func=3D<optimized out>, data=3D<optimi=
zed out>) at continuations.c:464
#27 0x00007f1fd7d24b39 in with_guile (base=3D0x7fffd2c0d318, data=3D0x7fffd=
2c0d340) at threads.c:645
#28 0x00007f1fd7bfb0ba in GC_call_with_stack_base () from target:/gnu/store=
/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#29 0x00007f1fd7d1d8b8 in scm_i_with_guile (dynamic_state=3D<optimized out>=
, data=3D<optimized out>, func=3D<optimized out>) at threads.c:688
#30 scm_with_guile (func=3D<optimized out>, data=3D<optimized out>) at thre=
ads.c:694
#31 0x00007f1fd7cbb025 in scm_boot_guile (argc=3D<optimized out>, argv=3D<o=
ptimized out>, main_func=3D<optimized out>, closure=3D<optimized out>)
    at init.c:295
#32 0x0000000000401191 in ?? ()
#33 0x00007f1fd7a117dd in __libc_start_main (main=3D0x401110, argc=3D8, arg=
v=3D0x7fffd2c0d4a8, init=3D<optimized out>, fini=3D<optimized out>,
    rtld_fini=3D<optimized out>, stack_end=3D0x7fffd2c0d498) at ../csu/libc=
-start.c:332
#34 0x00000000004011da in ?? ()
(gdb) p *fds
$1 =3D {fd =3D 15, events =3D 1, revents =3D 0}
(gdb) shell ls -l /proc/62345/fd/15
lrwx------ 1 root root 64 Apr 17 17:42 /proc/62345/fd/15 -> 'socket:[323897=
5699]'
--8<---------------cut here---------------end--------------->8---

The SSH connection is still live, and on the server side I see the =E2=80=
=98guix
repl -t machine=E2=80=99 process stuck in:

--8<---------------cut here---------------start------------->8---
write(2, "linking \342\200\230/gnu/store/0gbj38wswwxq50i68ci973fgi2l531b4-o=
penjdk-16.0.2-checkout/test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVec=
t.java\342\200\231 to \342\200\230/gnu/store/.links/03zscrls9431zipkpx1sn45=
0llvf5hkhvbkpjk57j22f7d2nb3vs\342\200\231\n", 221
--8<---------------cut here---------------end--------------->8---

(This message comes from optimise-store.cc:192.)

This suggests that =E2=80=98guix repl -t machine=E2=80=99 has filled its ou=
tput buffer;
its output isn=E2=80=99t being read so it gets stuck in that write(2) call.=
  Its
parent =E2=80=98sshd=E2=80=99 process is stuck in:

--8<---------------cut here---------------start------------->8---
ppoll([{fd=3D3, events=3DPOLLIN}, {fd=3D4, events=3D0}, {fd=3D10, events=3D=
POLLIN}, {fd=3D12, events=3DPOLLIN}, {fd=3D11, events=3DPOLLOUT}], 5, NULL,=
 [], 8^C
--8<---------------cut here---------------end--------------->8---

One hypothesis is that when transferring store items with many files,
such as openjdk-checkout here, guix-daemon on the remote machine outputs
many such =E2=80=9Clinking =E2=80=A6 to =E2=80=A6=E2=80=9D messages; if the=
 client doesn=E2=80=99t read them,
then we end up with a deadlock like this, with the server-side process
stuck writing more messages.

However, that =E2=80=9Clinking =E2=80=A6=E2=80=9D message is supposed to be=
 written only at
non-default debugging levels:

    printMsg(lvlTalkative, format("linking =E2=80=98%1%=E2=80=99 to =E2=80=
=98%2%=E2=80=99") % path % linkPath);

Turns out that guix-daemon on build machines is running with =E2=80=98--deb=
ug=E2=80=99
starting with maintenance.git commit
72d8a62466b71eda657b52ca17e99f7189ea18ad (=E2=80=9Csysadmin: build-machines:
Enable guix-daemon debug output.=E2=80=9D)

The immediate fix is to undo that=E2=80=94I=E2=80=99ll do that later.

The other one is to swallow =E2=80=98current-error-port=E2=80=99 in
=E2=80=98store-import-channel=E2=80=99.

To be continued=E2=80=A6

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#61839; Package guix. Full text available.
Changed bug title to ''guix offload' occasionally hangs on berlin, hitting max-silent-time' from '[Cuirass] Build timeouts during evaluations' Request was from Ludovic Courtès <ludo@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 61839 <at> debbugs.gnu.org:


Received: (at 61839) by debbugs.gnu.org; 17 Mar 2023 13:21:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 17 09:21:13 2023
Received: from localhost ([127.0.0.1]:43984 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pdA1B-0003dT-Dv
	for submit <at> debbugs.gnu.org; Fri, 17 Mar 2023 09:21:13 -0400
Received: from hera.aquilenet.fr ([185.233.100.1]:37342)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <andreas@HIDDEN>) id 1pdA1A-0003dH-3T
 for 61839 <at> debbugs.gnu.org; Fri, 17 Mar 2023 09:21:12 -0400
Received: from localhost (localhost [127.0.0.1])
 by hera.aquilenet.fr (Postfix) with ESMTP id 00C73CD3;
 Fri, 17 Mar 2023 14:21:06 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at hera.aquilenet.fr
Received: from hera.aquilenet.fr ([127.0.0.1])
 by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id zE2j33BYpVfW; Fri, 17 Mar 2023 14:21:05 +0100 (CET)
Received: from jurong (unknown [IPv6:2001:861:c4:f2f0::c64])
 by hera.aquilenet.fr (Postfix) with ESMTPSA id 6E43922F;
 Fri, 17 Mar 2023 14:21:05 +0100 (CET)
Date: Fri, 17 Mar 2023 14:21:03 +0100
From: Andreas Enge <andreas@HIDDEN>
To: Ludovic =?iso-8859-15?Q?Court=E8s?= <ludo@HIDDEN>, 61839 <at> debbugs.gnu.org
Subject: Re: Offloading problems on berlin
Message-ID: <ZBRpP/cDAYMQ5220@jurong>
References: <ZAtFp/uT6Z0N3XmN@jurong> <87pm9bmix7.fsf@HIDDEN>
 <ZBHJMMYvI0BhllzF@jurong> <ZBHctIg6Dk+Yn0Ik@jurong>
 <87wn3gereq.fsf_-_@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <87wn3gereq.fsf_-_@HIDDEN>
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 61839
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Am Thu, Mar 16, 2023 at 02:50:53PM +0100 schrieb Ludovic Courtès:
> I occasionally offload from my laptop to machines at work but didn’t hit
> this bug, so we’ll have to see if it’s specific to berlin or not.

My impression is that in my case, the problem is not with ssh, but somehow
with the guix daemon. I tend to observe this:

guix offload: sending 1 store item (6,490 MiB) to '141.80.167.169'...
exporting path `/gnu/store/y4ipvkapf1gninaabwdl6pcz46c1frak-texlive-texmf-20210325'

Then jnettop shows data transfer of about 100 to 120MB/s during a bit over
a minute, which makes me think that the actual data transfer succeeds, but
that somehow the data is not registered in the store.
Then nothing tangible happens and offloading times out. Since I am
apparently not able to connect to the machines via ssh, I cannot have a
closer look at the receiving end.

Andreas





Information forwarded to bug-guix@HIDDEN:
bug#61839; Package guix. Full text available.

Message received at 61839 <at> debbugs.gnu.org:


Received: (at 61839) by debbugs.gnu.org; 3 Mar 2023 14:24:26 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Mar 03 09:24:26 2023
Received: from localhost ([127.0.0.1]:60008 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pY6Kg-0001Hm-3W
	for submit <at> debbugs.gnu.org; Fri, 03 Mar 2023 09:24:26 -0500
Received: from eggs.gnu.org ([209.51.188.92]:43604)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1pY6Kf-0001HX-2v
 for 61839 <at> debbugs.gnu.org; Fri, 03 Mar 2023 09:24:25 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1pY6KZ-0005Z5-Oc
 for 61839 <at> debbugs.gnu.org; Fri, 03 Mar 2023 09:24:19 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=GzKp0NqEk85EOFRYaglJljC5m1S1rWp78gqTujAoz2Q=; b=c+1c3bELKcPlVSBcHl+l
 8kY+JPE9u9JZ5u9W0yq/SG3GeCXV2lWsYWvpsTbGp8fYtFba3SoFbn12tPNrIXGyxa/yvWwkW3arU
 cVNiYUC0y4QzbKaTwXzVi6ZI/wnfBnrpr+HqXTFHg5uPvTMTlspHWAFY8qAGami835WW27h5Gyk9Z
 lz6uH7xYRCq2bEridoZ2DGF8aeDeqWFs6alzBTOsVDMY3UD9tkG2mPo5EDTzAZS5YMOkUFSbPhAPH
 bVkwVtLt2XVTMflMb9yXNQhFdRnyeHeTGpnlW/rRF4gbOeMyK/iYSOXSF7++z4hWwYcdVYl/si8MW
 uz2zPh3ZRHHmJg==;
Received: from [193.50.110.138] (helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1pY6KZ-00081x-5Y; Fri, 03 Mar 2023 09:24:19 -0500
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: 61839 <at> debbugs.gnu.org
Subject: Re: bug#61839: [Cuirass] Build timeouts during evaluations
References: <87a60zo0le.fsf@HIDDEN>
Date: Fri, 03 Mar 2023 15:24:16 +0100
In-Reply-To: <87a60zo0le.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22?=
 =?utf-8?Q?'s?= message of "Mon, 27 Feb 2023 11:37:49 +0100")
Message-ID: <87sfelx69b.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 61839
Cc: Mathieu Othacehe <othacehe@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Ludovic Court=C3=A8s <ludo@HIDDEN> skribis:

> exporting path `/gnu/store/f26rw4haizf2h126k9ja9wk31cm49mj0-guix-17bd0243=
3'
> exporting path `/gnu/store/wc736va72zl2zz36817ahf3kc5fkfr69-inferior-scri=
pt.scm'
> building of `/gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv' timed=
 out after 3600 seconds of silence

I experienced something similar while building from the command line on
berlin, so it looks like an offloading problem.

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#61839; Package guix. Full text available.
Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 27 Feb 2023 10:37:55 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Feb 27 05:37:55 2023
Received: from localhost ([127.0.0.1]:46185 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pWatH-0001jy-53
	for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 05:37:55 -0500
Received: from lists.gnu.org ([209.51.188.17]:34636)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1pWatF-0001jr-Fp
 for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 05:37:54 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1pWatE-0003vy-TT
 for bug-guix@HIDDEN; Mon, 27 Feb 2023 05:37:53 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1pWatE-0002qy-Li
 for bug-guix@HIDDEN; Mon, 27 Feb 2023 05:37:52 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:Date:Subject:To:From:in-reply-to:
 references; bh=gFYGy5VJddheBiQLAVNYVs12BiehJcywFkEnng7brHo=; b=mlst+oxlzfZNyN
 FEyT7dPdMoH4tbd5BA4JaoHCFMDSgIOffVyRp7WyO8Eu3IUzVQ5QKKU7niv+83c15P0flSm+oMSBk
 eJONWnNRokno3djViYoxH2CqITBHjiodZpj/IKc+bH/QV76NfSjk6l2SheOHOK/lOLkejwURNw/wC
 aMfY8KYpyop2Y2FQR5HF9ODoYGq1UJHI+fBOcWzISTsMd15wGAqSiRwTtY7tdfiJkNSGlf7SuicxT
 TMR8L8JmCy3fbKJE9IXjP9L8VfGB7Mk15mPcGq+B+JMoEW9goIN/e9VGkUYSE7O4fJiv/duei7XNB
 EZn62V/dvMM88BajuCGw==;
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1pWatE-0007Ox-2o
 for bug-guix@HIDDEN; Mon, 27 Feb 2023 05:37:52 -0500
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: bug-guix@HIDDEN
Subject: [Cuirass] Build timeouts during evaluations
X-Debbugs-Cc: Mathieu Othacehe <othacehe@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: Nonidi 9 =?utf-8?Q?Vent=C3=B4se?= an 231 de la
 =?utf-8?Q?R=C3=A9volution=2C?= jour du Marsault
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Mon, 27 Feb 2023 11:37:49 +0100
Message-ID: <87a60zo0le.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hello Guix!

We often get evaluation errors for =E2=80=98core-updates=E2=80=99, where th=
e build log
reveals that this is due to a build timeout while building the Guix
instance that will be used for the evaluation.  Example:

--8<---------------cut here---------------start------------->8---
building of `/gnu/store/x9dyzywwmsp8vmpvpx54yhcnz7rqin0l-asciidoc-9.1.0.drv=
' timed out after 3600 seconds of silence
cannot build derivation `/gnu/store/bl7aq8999lmp6flbyw3k656m522jmcl3-git-2.=
39.1.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/ldajjyvb4np1zq9l9s09j0dpsbpscpan-gnulib=
-2022-12-06-1.440b528.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/fdks470j7xjw3dc6fiyw15hdsxj6cdbc-guile-=
gnutls-3.7.11.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/71pl6a38bd0b81qgi45wz2ymvlv0x7xn-guix-c=
li-core.drv': 1 dependencies couldn't be built

[...]

cannot build derivation `/gnu/store/m7r73wypxhhmv8cqza1x8g2zyq9k1xpg-profil=
e.drv': 1 dependencies couldn't be built

[...]

ERROR:
  1. &store-protocol-error:
      message: "build of `/gnu/store/m7r73wypxhhmv8cqza1x8g2zyq9k1xpg-profi=
le.drv' failed"
      status: 101
--8<---------------cut here---------------end--------------->8---

(From <https://ci.guix.gnu.org/eval/233402/log/raw>.)

Likewise for the =E2=80=98guile=E2=80=99 jobset:

--8<---------------cut here---------------start------------->8---
exporting path `/gnu/store/m2hyx7qk2j1bsvpj1qv89701klv8kqkk-guix-command'
exporting path `/gnu/store/vlm7401d5jvna6j7qdxhcr581cd8g8vh-guix-daemon-1.4=
.0-4.01fd830'
exporting path `/gnu/store/zv3c0anfj085d2lyxh4g4aahhczmcaj8-guix-daemon'
exporting path `/gnu/store/f26rw4haizf2h126k9ja9wk31cm49mj0-guix-17bd02433'
exporting path `/gnu/store/wc736va72zl2zz36817ahf3kc5fkfr69-inferior-script=
.scm'
building of `/gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv' timed o=
ut after 3600 seconds of silence
cannot build derivation `/gnu/store/nrcsvww8j24rjzyin90w76qgbkgphhvp-profil=
e.drv': 1 dependencies couldn't be built
Backtrace:
In ice-9/boot-9.scm:
  1752:10 12 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
          11 (apply-smob/0 #<thunk 7fd8cbe13340>)
In ice-9/boot-9.scm:
    724:2 10 (call-with-prompt _ _ #<procedure default-prompt-handle?>)
In ice-9/eval.scm:
    619:8  9 (_ #(#(#<directory (guile-user) 7fd8cbe16c80>)))
In cuirass/ui.scm:
   104:10  8 (run-cuirass-command _ . _)
In cuirass/scripts/evaluate.scm:
   105:27  7 (cuirass-evaluate _)
In ice-9/boot-9.scm:
  1752:10  6 (with-exception-handler _ _ #:unwind? _ # _)
In guix/store.scm:
   658:37  5 (thunk)
  2123:25  4 (run-with-store #<store-connection 256.99 7fd8cad09dc0> ?)
In cuirass/scripts/evaluate.scm:
     87:8  3 (_ _)
In guix/store.scm:
  1995:38  2 (_ #<store-connection 256.99 7fd8a33e6d70>)
  1421:15  1 (_ #<store-connection 256.99 7fd8a33e6d70> _ _)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
ERROR:
  1. &store-protocol-error:
      message: "build of `/gnu/store/nrcsvww8j24rjzyin90w76qgbkgphhvp-profi=
le.drv' failed"
      status: 101
--8<---------------cut here---------------end--------------->8---

Building this particular derivation is quick though:

--8<---------------cut here---------------start------------->8---
$ time guix build /gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv --c=
heck
The following derivation will be built:
  /gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv
building /gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv...
(repl-version 0 1 1)
(values (value "/gnu/store/rgk1sc2zvllf7cgbyj5g28b9v2x6nvn3-guile/share/gui=
le/site/3.0"))
successfully built /gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv
successfully built /gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv
/gnu/store/rgk1sc2zvllf7cgbyj5g28b9v2x6nvn3-guile

real    0m1.253s
user    0m0.751s
sys     0m0.034s
--8<---------------cut here---------------end--------------->8---

So the problem must be elsewhere.

One thing is that this particular .drv is quite big:

--8<---------------cut here---------------start------------->8---
$ guix size /gnu/store/yq9r44lic8cn4nphksbi5s3ppwfcsh1n-guile.drv
store item                                                       total    s=
elf
/gnu/store/wjr1sqxci75jfkvwzmmvlgypkyp3fvqi-guix-17bd024           105.0   =
105.0  42.9%
/gnu/store/6b3fp2kwqxw7cvnh6jz9g49n9k2n5avy-guix-packages-base-source    36=
.2    36.2  14.8%
/gnu/store/h3hjni2bd8f5kbn8w0g2xx2jnagfynnr-guile-e2ed33e           29.0   =
 29.0  11.8%
/gnu/store/gfb26xn5f2qyrxzv847cd6iql6h8igh0-packages                23.1   =
 23.1   9.4%
/gnu/store/dj8rq4jd8qgk4k1hzg8djcvi6hlbdb7m-doc                     22.3   =
 22.3   9.1%
/gnu/store/a5z192qxni40ly92cknwxkpnxc8zxfgf-guix                     7.8   =
  7.8   3.2%
/gnu/store/f8wg2y3rrh4dgi6znjrv8a1l4ymkc7rb-guix-packages-source     2.8   =
  2.8   1.2%
/gnu/store/8iwm8x6472xnf7g6hfxzw029gxm1qbkq-guix-system-source       2.5   =
  2.5   1.0%
/gnu/store/2dxbf11shgvs1cgvzz6wpnfra97f0sfw-guix-extra-source        2.5   =
  2.5   1.0%
/gnu/store/q57az26zdgafic5b8xsbxlh4iwddvr86-doc                      2.1   =
  2.1   0.9%
[=E2=80=A6]
/gnu/store/0mxnx8l4fgigvd7gakwdk6hc6im4wnai-disarchive-mirrors       0.0   =
  0.0   0.0%
total: 244.8 MiB
--8<---------------cut here---------------end--------------->8---

Merely transferring it for offloading takes time; depending on machine
load, perhaps it can hit the 1h max-silent timeout?  That=E2=80=99d mean the
throughput is below 70=C2=A0KiB/s.  Plausible?

The closure of
/gnu/store/x9dyzywwmsp8vmpvpx54yhcnz7rqin0l-asciidoc-9.1.0.drv is
3.0=C2=A0MiB and it builds in 4 seconds, so there must be something else
going on, perhaps a bug in libssh/Guile-SSH leading to stale
connections?

Thoughts?

Ludo=E2=80=99.




Acknowledgement sent to Ludovic Courtès <ludo@HIDDEN>:
New bug report received and forwarded. Copy sent to othacehe@HIDDEN, bug-guix@HIDDEN. Full text available.
Report forwarded to othacehe@HIDDEN, bug-guix@HIDDEN:
bug#61839; Package guix. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 17 Apr 2023 23:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.