GNU bug report logs - #59514
Stuck builds in Cuirass

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guix; Reported by: Marius Bakke <marius@HIDDEN>; Keywords: wontfix; Done: Ludovic Courtès <ludo@HIDDEN>; Maintainer for guix is bug-guix@HIDDEN.
bug closed, send any further explanations to 59514 <at> debbugs.gnu.org and Marius Bakke <marius@HIDDEN> Request was from Ludovic Courtès <ludo@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Added tag(s) wontfix. Request was from Ludovic Courtès <ludo@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 59514 <at> debbugs.gnu.org:


Received: (at 59514) by debbugs.gnu.org; 23 Nov 2022 13:27:05 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Nov 23 08:27:05 2022
Received: from localhost ([127.0.0.1]:54159 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oxpmK-0005MO-TX
	for submit <at> debbugs.gnu.org; Wed, 23 Nov 2022 08:27:05 -0500
Received: from eggs.gnu.org ([209.51.188.92]:51222)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1oxpmG-0005Lr-1H
 for 59514 <at> debbugs.gnu.org; Wed, 23 Nov 2022 08:27:03 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>) id 1oxpmA-0006gp-Ou
 for 59514 <at> debbugs.gnu.org; Wed, 23 Nov 2022 08:26:54 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=Ob4I3MZXcUU836+rUKA/a8VrDsj+DR0JWv7/EVSC+JI=; b=NMrBdvp1Jogug5UzfYTL
 qnZmufxrBbv3yRrzUXTThkP6ZL/2Zoix2UFNbIdiKCmcNcUrpLNjlRWjDrTwtDe5L766GHo3xJ6cK
 20mUwXqbw9Afq9AxcidDqSIjspx1ajisdIEDkveWCAq7CZV9wzDUpvXmJ482TWfs2nYnyAJnQVpXp
 WiWBiDKqKnjvN8i69eNTS5yy2O3xEUlz9glwL0X46jFnXg/18f25rsrwImIehxp1jO/ujA5uam9S8
 DD1PhI3/MDHo9BIoKRJXn4j9U3ufcSZ7hVPS0f1Nzm8fsq7OZMbbbzXjbhVj5euvWEpACiGnOaR/6
 xzTo9ljCtSwhAA==;
Received: from 2a02-8429-81d2-3d01-94c9-8097-ea5c-2775.rev.sfr.net
 ([2a02:8429:81d2:3d01:94c9:8097:ea5c:2775] helo=meije)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1oxpm9-0002jW-NS; Wed, 23 Nov 2022 08:26:54 -0500
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Marius Bakke <marius@HIDDEN>
Subject: Re: bug#59514: Stuck builds in Cuirass
References: <87tu2pzvfo.fsf@HIDDEN>
Date: Wed, 23 Nov 2022 14:26:50 +0100
In-Reply-To: <87tu2pzvfo.fsf@HIDDEN> (Marius Bakke's message of "Wed, 23 Nov
 2022 13:50:35 +0100")
Message-ID: <874jupx0md.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 59514
Cc: 59514 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hello Marius,

> Cuirass has a tendency to not notice when a build is finished, leaving
> it in a "running" state.
>
> The phenomenon can be observed by going to
> <https://ci.guix.gnu.org/status> and look at builds that are running for
> a suspiciously long time.

I suspect this is caused by https://issues.guix.gnu.org/59510 which
causes the worker threads to bail out.

We can probably merge those two issues. The
/var/log/cuirass-remote-server.log file on Berlin also indicates when
the build-succeeded or build-failed message is received by the server,
and how long the fetch from the worker took.

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#59514; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 23 Nov 2022 12:50:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Nov 23 07:50:49 2022
Received: from localhost ([127.0.0.1]:54096 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oxpDF-0004QB-2M
	for submit <at> debbugs.gnu.org; Wed, 23 Nov 2022 07:50:49 -0500
Received: from lists.gnu.org ([209.51.188.17]:57328)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <marius@HIDDEN>) id 1oxpDD-0004Q2-6X
 for submit <at> debbugs.gnu.org; Wed, 23 Nov 2022 07:50:47 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <marius@HIDDEN>) id 1oxpDC-0008JQ-Vr
 for bug-guix@HIDDEN; Wed, 23 Nov 2022 07:50:47 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <marius@HIDDEN>) id 1oxpDC-0002oK-Mg
 for bug-guix@HIDDEN; Wed, 23 Nov 2022 07:50:46 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:Date:Subject:To:From:in-reply-to:
 references; bh=NE0AyU4pYv3tPCGsqeFSDfBxMlu7vrwo1AToliItGqs=; b=RMeg/hb1g6Tfej
 uCNvPuLMXyhbPSny21x4QmN3Got/nwDbMQ+FdaXp1Qx/7lpMDL3Ayqv3BjEkImnh9Y+2WCHUkD5dV
 jJniwEuTgwTFpdcV301tUzgkhuiU3hderTx6kVccqjxj+fHICU+Cy5qUWRTaxj4qcOBcb9j79XQkC
 HbvQXv5hyIenR3DrGP4OwXKdq13C/EkC6ntx4AC79wo2hS/Dx8SkeBjZNrkEiIiquYmNrJ8rvQqWB
 BcL8XI+XDsBJYHdM1tI5HqqdbxJDskr0774coORy2tSrEfFbJ20eB0m8a7WPvQFtLOkV/mC8d3uRW
 NQ86oTf7cMYlEavVoMRg==;
Received: from [2a02:2121:302:f5f9:52eb:71ff:fe49:3a13] (helo=localhost)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <marius@HIDDEN>) id 1oxpDB-000194-9n
 for bug-guix@HIDDEN; Wed, 23 Nov 2022 07:50:46 -0500
From: Marius Bakke <marius@HIDDEN>
To: bug-guix@HIDDEN
Subject: Stuck builds in Cuirass
X-Debbugs-CC: othacehe@HIDDEN
Date: Wed, 23 Nov 2022 13:50:35 +0100
Message-ID: <87tu2pzvfo.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="==-=-=";
 micalg=pgp-sha512; protocol="application/pgp-signature"
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--==-=-=
Content-Type: multipart/mixed; boundary="=-=-="

--=-=-=
Content-Type: text/plain

Hi,

Cuirass has a tendency to not notice when a build is finished, leaving
it in a "running" state.

The phenomenon can be observed by going to
<https://ci.guix.gnu.org/status> and look at builds that are running for
a suspiciously long time.

Typically the build log will indicate that it has finished, yet Cuirass
is patiently waiting...and not scheduling further builds.

Restarting the builds typically get things going again.

I wrote a nasty script to automatically restart builds that are running
for >1 hour, but it's not a sustainable solution:


--=-=-=
Content-Type: text/plain
Content-Disposition: attachment; filename=restart-old-builds.py
Content-Transfer-Encoding: quoted-printable

#!/usr/bin/env python3

# Restart stuck builds....   TODO fix cuirass properly.

import requests
from bs4 import BeautifulSoup
import re

builds_page =3D "https://ci.guix.gnu.org/status"
builds_html =3D requests.get(builds_page).text

soup =3D BeautifulSoup(builds_html, "html5lib")
main =3D soup.find('main', {'id': 'content'})
table =3D main.find('table')

result =3D {}

for row in table.find_all('tr'):
    data =3D row.find_all('td')
    if len(data) > 0:
        build_id =3D row.find('a').contents[0]
        name =3D data[0].contents[0]
        age =3D data[1].contents[0]
        system =3D data[2].contents[0]
        log =3D data[3]

        result[build_id] =3D {'name': name, 'age': age, 'system': system}

age_re =3D re.compile("(\d+) (\w+) ago")
restart =3D []

for id in result.keys():
    age =3D result[id]['age']
    match =3D age_re.match(result[id]['age'])
    if match is not None:  # "seconds ago"
        digits =3D match.group(1)
        time_unit =3D match.group(2)
        if time_unit =3D=3D "hours":
            restart.append(id)
        elif time_unit =3D=3D "minutes" and int(digits) > 60:
            restart.append(id)

certificate_file =3D "/home/marius/tmp/mbakke.cert.pem"
certificate_key =3D "/home/marius/tmp/mbakke.key.pem"

import time

print(f"Found {len(restart)} stuck builds..!")

for id in restart:
    print(f"Going to restart {result[id]['name']} ({id}, running since {res=
ult[id]['age']})...")
    requests.get(f"https://ci.guix.gnu.org/admin/build/{id}/restart",
                 cert=3D(certificate_file, certificate_key))
    time.sleep(3)

--=-=-=--

--==-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iIUEARYKAC0WIQRNTknu3zbaMQ2ddzTocYulkRQQdwUCY34XGw8cbWFyaXVzQGdu
dS5vcmcACgkQ6HGLpZEUEHcYCQD/WbYxZ+Mi1I4kYSCKqRmuVrucf7oVXlZwAyFT
KHhbOrQA/jUT3vZCpeiiSPWyxedXqYOBllkcvQXgmT3tj4RPcZMH
=pDj4
-----END PGP SIGNATURE-----
--==-=-=--




Acknowledgement sent to Marius Bakke <marius@HIDDEN>:
New bug report received and forwarded. Copy sent to othacehe@HIDDEN, bug-guix@HIDDEN. Full text available.
Report forwarded to othacehe@HIDDEN, bug-guix@HIDDEN:
bug#59514; Package guix. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 14 Jul 2024 21:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.