GNU bug report logs - #61646
Bandwidth-induced offload timeout abort whole operating

Previous Next

Package: guix;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Mon, 20 Feb 2023 03:29:02 UTC

Severity: normal

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61646 in the body.
You can then email your comments to 61646 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#61646; Package guix. (Mon, 20 Feb 2023 03:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 20 Feb 2023 03:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: bug-guix <bug-guix <at> gnu.org>
Subject: Bandwidth-induced offload timeout abort whole operating
Date: Sun, 19 Feb 2023 22:28:16 -0500
Hi Guix,

I can reproduce this rather easily on my system:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build icedove
The following derivations will be built:
  /gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv
  /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv
  /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv
process 19542 acquired build slot '/var/guix/offload/localhost:6666/0'
normalized load on machine 'localhost' is 0.08
building /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv...
process 19548 acquired build slot '/var/guix/offload/localhost:6666/1'
normalized load on machine 'localhost' is 0.08
building /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv...
guix offload: sending 1 store item (558 MiB) to 'localhost'...
exporting path `/gnu/store/bwb5hcdyzgq16kmbsva7ax0zq6lzg78z-icedove-102.7.2.tar.xz'
guix offload: error: failed to connect to 'localhost': Timeout connecting to localhost
cannot build derivation `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv': 1 dependencies couldn't be built
guix build: error: build of
  `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv' failed
--8<---------------cut here---------------end--------------->8---

The third derivation tries to get a build slot and times out, because
the first two have already saturated the bandwidth of the link and it
takes more time than expected to get a reply.

The workaround is to use '-k', for "--keep-continuing", and retry the
3rd failing derivation after the first two completed.

I don't have a clear idea on how to improve the situation other than use
longer timeouts... but perhaps these timeouts could be dynamic based on
the load of the network/CPU/ ?

-- 
Thanks,
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#61646; Package guix. (Thu, 23 Feb 2023 22:27:01 GMT) Full text and rfc822 format available.

Message #8 received at 61646 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 61646 <at> debbugs.gnu.org
Subject: Re: bug#61646: Bandwidth-induced offload timeout abort whole operating
Date: Thu, 23 Feb 2023 23:26:22 +0100
[Message part 1 (text/plain, inline)]
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:

> I can reproduce this rather easily on my system:
>
> $ ./pre-inst-env guix build icedove
> The following derivations will be built:
>   /gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv
>   /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv
>   /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv
> process 19542 acquired build slot '/var/guix/offload/localhost:6666/0'
> normalized load on machine 'localhost' is 0.08
> building /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv...
> process 19548 acquired build slot '/var/guix/offload/localhost:6666/1'
> normalized load on machine 'localhost' is 0.08
> building /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv...
> guix offload: sending 1 store item (558 MiB) to 'localhost'...
> exporting path `/gnu/store/bwb5hcdyzgq16kmbsva7ax0zq6lzg78z-icedove-102.7.2.tar.xz'
> guix offload: error: failed to connect to 'localhost': Timeout connecting to localhost
> cannot build derivation `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv': 1 dependencies couldn't be built
> guix build: error: build of
>   `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv' failed
>
> The third derivation tries to get a build slot and times out, because
> the first two have already saturated the bandwidth of the link and it
> takes more time than expected to get a reply.

Weird.  Since the it’s a timeout while connecting, I suppose the patch
below would improve the situation:

[Message part 2 (text/x-patch, inline)]
diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
index 578b3b9888..90cf97401c 100644
--- a/guix/scripts/offload.scm
+++ b/guix/scripts/offload.scm
@@ -220,7 +220,7 @@ (define* (open-ssh-session machine #:optional max-silent-time)
         (session (make-session #:user (build-machine-user machine)
                                #:host (build-machine-name machine)
                                #:port (build-machine-port machine)
-                               #:timeout 10       ;initial timeout (seconds)
+                               #:timeout 30       ;initial timeout (seconds)
                                ;; #:log-verbosity 'protocol
                                #:identity (build-machine-private-key machine)
 
[Message part 3 (text/plain, inline)]
WDYT?

Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#61646; Package guix. (Sat, 25 Feb 2023 02:47:02 GMT) Full text and rfc822 format available.

Message #11 received at 61646 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 61646 <at> debbugs.gnu.org
Subject: Re: bug#61646: Bandwidth-induced offload timeout abort whole operating
Date: Fri, 24 Feb 2023 21:46:29 -0500
Hi Ludovic,

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:
>
>> I can reproduce this rather easily on my system:
>>
>> $ ./pre-inst-env guix build icedove
>> The following derivations will be built:
>>   /gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv
>>   /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv
>>   /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv
>> process 19542 acquired build slot '/var/guix/offload/localhost:6666/0'
>> normalized load on machine 'localhost' is 0.08
>> building /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv...
>> process 19548 acquired build slot '/var/guix/offload/localhost:6666/1'
>> normalized load on machine 'localhost' is 0.08
>> building /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv...
>> guix offload: sending 1 store item (558 MiB) to 'localhost'...
>> exporting path `/gnu/store/bwb5hcdyzgq16kmbsva7ax0zq6lzg78z-icedove-102.7.2.tar.xz'
>> guix offload: error: failed to connect to 'localhost': Timeout connecting to localhost
>> cannot build derivation
>> `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv': 1
>> dependencies couldn't be built
>> guix build: error: build of
>>   `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv' failed
>>
>> The third derivation tries to get a build slot and times out, because
>> the first two have already saturated the bandwidth of the link and it
>> takes more time than expected to get a reply.
>
> Weird.  Since the it’s a timeout while connecting, I suppose the patch
> below would improve the situation:
>
> diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
> index 578b3b9888..90cf97401c 100644
> --- a/guix/scripts/offload.scm
> +++ b/guix/scripts/offload.scm
> @@ -220,7 +220,7 @@ (define* (open-ssh-session machine #:optional max-silent-time)
>          (session (make-session #:user (build-machine-user machine)
>                                 #:host (build-machine-name machine)
>                                 #:port (build-machine-port machine)
> -                               #:timeout 10       ;initial timeout (seconds)
> +                               #:timeout 30       ;initial timeout (seconds)
>                                 ;; #:log-verbosity 'protocol
>                                 #:identity (build-machine-private-key machine)

Hm, how can I test this again?

I tried launching a daemon both on the remote and locally, with
something like:

sudo -E ./pre-inst-env ./guix-daemon --build-users-group guixbuild
--max-silent-time 0 --timeout 0 --log-compression none --discover=yes
--substitute-urls "https://ci.guix.gnu.org
https://bordeaux.guix.gnu.org" --max-jobs=20

and the code edited doesn't seem to run (I put an (error 'hello) in
there and nothing happened).

-- 
Thanks,
Maxim




Reply sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
You have taken responsibility. (Sat, 25 Feb 2023 03:08:02 GMT) Full text and rfc822 format available.

Notification sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
bug acknowledged by developer. (Sat, 25 Feb 2023 03:08:02 GMT) Full text and rfc822 format available.

Message #16 received at 61646-done <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 61646-done <at> debbugs.gnu.org
Subject: Re: bug#61646: Bandwidth-induced offload timeout abort whole operating
Date: Fri, 24 Feb 2023 22:07:43 -0500
Hello,

Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

> Weird.  Since the it’s a timeout while connecting, I suppose the patch
> below would improve the situation:
>
> diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
> index 578b3b9888..90cf97401c 100644
> --- a/guix/scripts/offload.scm
> +++ b/guix/scripts/offload.scm
> @@ -220,7 +220,7 @@ (define* (open-ssh-session machine #:optional max-silent-time)
>          (session (make-session #:user (build-machine-user machine)
>                                 #:host (build-machine-name machine)
>                                 #:port (build-machine-port machine)
> -                               #:timeout 10       ;initial timeout (seconds)
> +                               #:timeout 30       ;initial timeout (seconds)
>                                 ;; #:log-verbosity 'protocol
>                                 #:identity (build-machine-private-key machine)

Nevermind my previous message, it was --sysconfdir that had not been
set, thus ignoring my offload setup (/etc/guix/machines.scm).  The
command worked to test the change from the local machine:

--8<---------------cut here---------------start------------->8---
sudo -E ./pre-inst-env ./guix-daemon --build-users-group guixbuild \
 --max-silent-time 0 --timeout 0 --log-compression none --discover=yes \
 --substitute-urls "https://ci.guix.gnu.org https://bordeaux.guix.gnu.org" \
 --max-jobs=4
--8<---------------cut here---------------end--------------->8---

I pushed the fix in commit 53d718f61b.

Closing, thank you!

-- 
Thanks,
Maxim




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 25 Mar 2023 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 25 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.