GNU logs - #33410, boring messages


Message sent to bug-guix@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33410: Offloaded builds can get stuck indefinitely due to network issues
Resent-From: Mark H Weaver <mhw@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guix@HIDDEN
Resent-Date: Sat, 17 Nov 2018 04:10:01 +0000
Resent-Message-ID: <handler.33410.B.15424277842023 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 33410
X-GNU-PR-Package: guix
X-GNU-PR-Keywords: 
To: 33410 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-guix@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.15424277842023
          (code B ref -1); Sat, 17 Nov 2018 04:10:01 +0000
Received: (at submit) by debbugs.gnu.org; 17 Nov 2018 04:09:44 +0000
Received: from localhost ([127.0.0.1]:56982 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gNrvU-0000WY-2G
	for submit <at> debbugs.gnu.org; Fri, 16 Nov 2018 23:09:44 -0500
Received: from eggs.gnu.org ([208.118.235.92]:33886)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1gNrvS-0000WL-8Q
 for submit <at> debbugs.gnu.org; Fri, 16 Nov 2018 23:09:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <mhw@HIDDEN>) id 1gNrvM-0000sp-DK
 for submit <at> debbugs.gnu.org; Fri, 16 Nov 2018 23:09:37 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:41751)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <mhw@HIDDEN>) id 1gNrvM-0000sj-At
 for submit <at> debbugs.gnu.org; Fri, 16 Nov 2018 23:09:36 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43494)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <mhw@HIDDEN>) id 1gNrvL-00032C-J7
 for bug-guix@HIDDEN; Fri, 16 Nov 2018 23:09:36 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <mhw@HIDDEN>) id 1gNrvI-0000rI-Em
 for bug-guix@HIDDEN; Fri, 16 Nov 2018 23:09:35 -0500
Received: from world.peace.net ([64.112.178.59]:50540)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <mhw@HIDDEN>) id 1gNrvI-0000r9-C4
 for bug-guix@HIDDEN; Fri, 16 Nov 2018 23:09:32 -0500
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gNrvH-0002B7-QZ; Fri, 16 Nov 2018 23:09:31 -0500
From: Mark H Weaver <mhw@HIDDEN>
Date: Fri, 16 Nov 2018 23:08:50 -0500
Message-ID: <87a7m8xs42.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

I just discovered that 4 out of 5 armhf build slots on Hydra have been
stuck for 24 hours, apparently after the network connections to the
build slaves were lost, possibly due to a temporary network outage.

I've seen this kind of thing happen periodically since we switched to
using guile-ssh for offloaded builds.

On Hydra I can monitor the builds and investigate when a given build
seems to be taking far too long, and I can kill those jobs to free up
the build slots.  There's no way to kill the builds from Hydra's web
interface, but I can kill them manually by logging into Hydra.

This might become a more serious problem on Berlin, as we add ARM build
slaves that are not on the same local network as Berlin itself, until
the web interface allows for this kind of monitoring and intervention.

      Mark




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Mark H Weaver <mhw@HIDDEN>
Subject: bug#33410: Acknowledgement (Offloaded builds can get stuck
 indefinitely due to network issues)
Message-ID: <handler.33410.B.15424277842023.ack <at> debbugs.gnu.org>
References: <87a7m8xs42.fsf@HIDDEN>
X-Gnu-PR-Message: ack 33410
X-Gnu-PR-Package: guix
Reply-To: 33410 <at> debbugs.gnu.org
Date: Sat, 17 Nov 2018 04:10:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-guix@HIDDEN

If you wish to submit further information on this problem, please
send it to 33410 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
33410: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D33410
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-guix@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33410: Offloaded builds can get stuck indefinitely due to network issues
Resent-From: ludo@HIDDEN (Ludovic =?UTF-8?Q?Court=C3=A8s?=)
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guix@HIDDEN
Resent-Date: Sat, 17 Nov 2018 14:22:02 +0000
Resent-Message-ID: <handler.33410.B33410.15424645019089 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33410
X-GNU-PR-Package: guix
X-GNU-PR-Keywords: 
To: Mark H Weaver <mhw@HIDDEN>
Cc: 33410 <at> debbugs.gnu.org
Received: via spool by 33410-submit <at> debbugs.gnu.org id=B33410.15424645019089
          (code B ref 33410); Sat, 17 Nov 2018 14:22:02 +0000
Received: (at 33410) by debbugs.gnu.org; 17 Nov 2018 14:21:41 +0000
Received: from localhost ([127.0.0.1]:57164 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gO1Tg-0002MX-UD
	for submit <at> debbugs.gnu.org; Sat, 17 Nov 2018 09:21:41 -0500
Received: from eggs.gnu.org ([208.118.235.92]:45774)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1gO1Tf-0002ML-L3
 for 33410 <at> debbugs.gnu.org; Sat, 17 Nov 2018 09:21:39 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <ludo@HIDDEN>) id 1gO1TZ-00034q-MT
 for 33410 <at> debbugs.gnu.org; Sat, 17 Nov 2018 09:21:34 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled
 version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39103)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@HIDDEN>)
 id 1gO1TZ-00034m-Jb; Sat, 17 Nov 2018 09:21:33 -0500
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=49876 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1gO1TZ-0004Tq-Bv; Sat, 17 Nov 2018 09:21:33 -0500
From: ludo@HIDDEN (Ludovic =?UTF-8?Q?Court=C3=A8s?=)
References: <87a7m8xs42.fsf@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 27 Brumaire an 227 de la =?UTF-8?Q?R=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Sat, 17 Nov 2018 15:21:32 +0100
In-Reply-To: <87a7m8xs42.fsf@HIDDEN> (Mark H. Weaver's message of "Fri, 16
 Nov 2018 23:08:50 -0500")
Message-ID: <87efbjokcz.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

Hello,

Mark H Weaver <mhw@HIDDEN> skribis:

> I just discovered that 4 out of 5 armhf build slots on Hydra have been
> stuck for 24 hours, apparently after the network connections to the
> build slaves were lost, possibly due to a temporary network outage.
>
> I've seen this kind of thing happen periodically since we switched to
> using guile-ssh for offloaded builds.

Which guix-daemon version is hydra running?

Commit a708de151c255712071e42e5c8284756b51768cd adds a safeguard to make
sure timeouts are honored, though there might be some cases where it
doesn=E2=80=99t quite work as expected (I suspect libssh handles EINTR
internally by looping, in which case our signal handling async doesn=E2=80=
=99t
get a chance to run.)

> On Hydra I can monitor the builds and investigate when a given build
> seems to be taking far too long, and I can kill those jobs to free up
> the build slots.  There's no way to kill the builds from Hydra's web
> interface, but I can kill them manually by logging into Hydra.
>
> This might become a more serious problem on Berlin, as we add ARM build
> slaves that are not on the same local network as Berlin itself, until
> the web interface allows for this kind of monitoring and intervention.

The current situation on berlin is suboptimal: I run =E2=80=98guix processe=
s=E2=80=99
when I suspect something is wrong, and that=E2=80=99s how I found about
<https://issues.guix.info/issue/33239>.

Thanks,
Ludo=E2=80=99.





Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.