GNU bug report logs - #60803
Cuirass stopped processing jobs for aarch64-linux and x86_64-linux

Previous Next

Package: guix;

Reported by: Marius Bakke <marius <at> gnu.org>

Date: Sat, 14 Jan 2023 05:20:02 UTC

Severity: normal

To reply to this bug, email your comments to 60803 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to othacehe <at> gnu.org, bug-guix <at> gnu.org:
bug#60803; Package guix. (Sat, 14 Jan 2023 05:20:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Marius Bakke <marius <at> gnu.org>:
New bug report received and forwarded. Copy sent to othacehe <at> gnu.org, bug-guix <at> gnu.org. (Sat, 14 Jan 2023 05:20:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <marius <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: Cuirass stopped processing jobs for aarch64-linux and x86_64-linux
Date: Sat, 14 Jan 2023 06:19:22 +0100
Hello Guix,

Cuirass has stopped processing (old) jobs for aarch64 and x86_64.  After
digging through the database it's because (db-get-pending-build ...)
returns a build that is missing from the Jobs table:

  WITH pending_dependencies AS
  (SELECT Builds.id, count(dep.id) as deps FROM Builds
  LEFT JOIN BuildDependencies as bd ON bd.source = Builds.id
  LEFT JOIN Builds AS dep ON bd.target = dep.id AND dep.status != 0
  WHERE Builds.status = -2 AND Builds.system = 'x86_64-linux'
  GROUP BY builds.id
  ORDER BY Builds.priority ASC, Builds.timestamp DESC)
  SELECT id FROM pending_dependencies where deps = 0 limit 1;

     id
  --------
   335212

However:

  select * from jobs  where  build = 335212;
   name | evaluation | build | status | system
  ------+------------+-------+--------+--------
  (0 rows)

For clarity:

  select id,derivation,evaluation,job_name,nix_name,status from builds where id = 335212;
     id   |                            derivation                             | evaluation |       job_name        |     nix_name      | status
  --------+-------------------------------------------------------------------+------------+-----------------------+-------------------+--------
   335212 | /gnu/store/yzgcza0nijnp79mzz878q9a61p6jykgh-perftest-4.5-0.20.drv |     103435 | perftest.x86_64-linux | perftest-4.5-0.20 |     -2

The derivation is also missing from the Outputs table, which causes the
monster query in (db-get-builds ...), which is what workers call to
fetch the next job, to return nothing.

335212 belongs to evaluation 103435 according to the above query, but
does not show up here:

  https://ci.guix.gnu.org/eval/103435?all=&paginate=0

The build id sequence appears to belong to this evaluation:

  https://ci.guix.gnu.org/eval/103436?all=&paginate=0

(notice how it has 335211 and 335213).

I'm not sure how to recover from this.  Either manually create the
entries in Jobs and Outputs, or delete the offending Builds entry?

The 335212 build is for x86_64-linux, we have the same problem with
335087 (also perftest) on aarch64.  i686-linux and powerpc64le-linux is
fine.

Ideas?




Information forwarded to bug-guix <at> gnu.org:
bug#60803; Package guix. (Sun, 15 Jan 2023 04:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 60803 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <marius <at> gnu.org>
To: 60803 <at> debbugs.gnu.org
Cc: othacehe <at> gnu.org
Subject: Re: bug#60803: Cuirass stopped processing jobs for aarch64-linux
 and x86_64-linux
Date: Sun, 15 Jan 2023 05:26:27 +0100
[Message part 1 (text/plain, inline)]
Marius Bakke <marius <at> gnu.org> skriver:

> The 335212 build is for x86_64-linux, we have the same problem with
> 335087 (also perftest) on aarch64.  i686-linux and powerpc64le-linux is
> fine.

I deleted these two from the Builds and BuildDependencies tables which
allowed Cuirass to move forward (or backwards, really, as it was
processing new jobs just fine).

Not sure how to mitigate the problem (race when two evaluations create
different derivations with identical outputs at the same time?), but at
least we know how to deal with it.

Speaking of builds, I started debugging #60016 and accidentally deleted
build 175246!  Enough late night debugging for me...  I'll set up my own
Cuirass to experiment on "soon".
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 1 year and 110 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.