GNU bug report logs - #67988
[Cuirass] ‘request-work’ responses received by several workers

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Sat, 23 Dec 2023 09:14:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 67988 in the body.
You can then email your comments to 67988 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#67988; Package guix. (Sat, 23 Dec 2023 09:14:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 23 Dec 2023 09:14:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: bug-guix <at> gnu.org
Subject: [Cuirass] ‘request-work’ responses
 received by several workers
Date: Sat, 23 Dec 2023 10:13:01 +0100
Hello,

I’m under the impression that sometimes, when the server replies to
‘worker-request-work’ messages, its reply is received by more than just
the target worker, leading to builds being performed twice:

--8<---------------cut here---------------start------------->8---
ludo <at> berlin ~$ sudo grep lyhz5d1jb396m32dy0fs9h8vqzw95ddp /var/log/cuirass-remote-server.log
2023-12-23 00:15:29 141.80.167.184 (0LFowqzr): build started: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'.
2023-12-23 00:18:41 fetching 1 outputs of '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' from http://141.80.167.184:5558
2023-12-23 00:18:45 build succeeded: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'
2023-12-23 00:21:20 141.80.167.159 (oNzYXCv5): build started: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'.
2023-12-23 00:24:31 fetching 1 outputs of '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' from http://141.80.167.159:5558
2023-12-23 00:24:32 build succeeded: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'
ludo <at> berlin ~$ sudo ssh root <at> 141.80.167.184 grep lyhz5d1jb396m32dy0fs9h8vqzw95ddp /var/log/cuirass-remote-worker.log
2023-12-23 00:12:32 0LFowqzr: building derivation `/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' (system: x86_64-linux)
2023-12-23 00:12:54 0LFowqzr: derivation /gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv build succeeded.
ludo <at> berlin ~$ sudo ssh root <at> 141.80.167.159 grep lyhz5d1jb396m32dy0fs9h8vqzw95ddp /var/log/cuirass-remote-worker.log
2023-12-23 00:17:51 oNzYXCv5: building derivation `/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' (system: x86_64-linux)
2023-12-23 00:18:17 oNzYXCv5: derivation /gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv build succeeded.
--8<---------------cut here---------------end--------------->8---

This is with Cuirass 1.2.0-1.bdc1f9f.

To be continued…

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#67988; Package guix. (Tue, 28 May 2024 21:52:02 GMT) Full text and rfc822 format available.

Message #8 received at 67988 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 67988 <at> debbugs.gnu.org
Subject: Re: bug#67988: [Cuirass] ‘request-work’
 responses received by several workers
Date: Tue, 28 May 2024 23:50:39 +0200
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> I’m under the impression that sometimes, when the server replies to
> ‘worker-request-work’ messages, its reply is received by more than just
> the target worker, leading to builds being performed twice:

Seen again:

--8<---------------cut here---------------start------------->8---
ludo <at> guix-hpc4 ~/src/cuirass$ sudo grep  nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-server.log
2024-05-28 21:31:43 194.199.1.26 (PajrOfGX): build started: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'.
2024-05-28 21:34:22 194.199.1.27 (exataaY9): build started: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'.
2024-05-28 21:38:32 194.199.1.17 (DIwFaVSn): build started: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'.
2024-05-28 22:16:13 fetching 1 outputs of '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' from http://194.199.1.26:5558
2024-05-28 22:16:18 build succeeded: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'
2024-05-28 22:53:49 fetching 1 outputs of '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' from http://194.199.1.27:5558
2024-05-28 22:53:49 build succeeded: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'
2024-05-28 23:03:50 fetching 1 outputs of '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' from http://194.199.1.17:5558
2024-05-28 23:03:50 build succeeded: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'
--8<---------------cut here---------------end--------------->8---

And on workers:

--8<---------------cut here---------------start------------->8---
$ ssh root <at> guix-hpc3 grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-worker.log
2024-05-28 21:57:43 DIwFaVSn: building derivation `/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' (system: x86_64-linux)
2024-05-28 23:22:58 DIwFaVSn: derivation /gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv build succeeded.
$ ssh root <at> guix-hpc5 grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-worker.log
2024-05-28 21:34:13 PajrOfGX: building derivation `/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' (system: x86_64-linux)
2024-05-28 22:18:40 PajrOfGX: derivation /gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv build succeeded.
$ ssh root <at> guix-hpc7 grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-worker.log
2024-05-28 21:34:11 exataaY9: building derivation `/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' (system: x86_64-linux)
2024-05-28 22:53:35 exataaY9: derivation /gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv build succeeded.
--8<---------------cut here---------------end--------------->8---

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#67988; Package guix. (Fri, 31 May 2024 19:56:02 GMT) Full text and rfc822 format available.

Message #11 received at 67988 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 67988 <at> debbugs.gnu.org
Subject: Re: bug#67988: [Cuirass] ‘request-work’
 responses received by several workers
Date: Fri, 31 May 2024 21:55:16 +0200
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> I’m under the impression that sometimes, when the server replies to
> ‘worker-request-work’ messages, its reply is received by more than just
> the target worker, leading to builds being performed twice:

On closer inspection, the theory of the message being received by two
different peers doesn’t hold.

Instead, I believe ‘db-get-pending-build’ would return the same build at
two different points in time, typically while the first one is still
running.

That’s normally not possible because the build’s status is changed to
‘submitted’ once it’s been picked up.  Turns out that, due to slowness
of the query in ‘db-get-pending-build’ (fixed in
17338588d4862b04e9e405c1244a2ea703b50d98), ‘remote-server’ would
sometimes fail to see worker pings in a timely fashion.  Thus, it would
call ‘db-remove-unresponsive-workers’, which would reschedule builds
that were being carried out by said worker(s).  And that’s how we would
end up with multiple concurrent builds of the same derivation.

I added logging in c2061ca845d05694ebeb88935a6ff2254711beb2, which
should give a hint, should that happen again.

Ludo’.




bug closed, send any further explanations to 67988 <at> debbugs.gnu.org and Ludovic Courtès <ludovic.courtes <at> inria.fr> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 04 Jun 2024 13:58:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 03 Jul 2024 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 23 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.