GNU bug report logs -
#67988
[Cuirass] ‘request-work’ responses received by several workers
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 67988 in the body.
You can then email your comments to 67988 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#67988
; Package
guix
.
(Sat, 23 Dec 2023 09:14:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ludovic Courtès <ludovic.courtes <at> inria.fr>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Sat, 23 Dec 2023 09:14:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
I’m under the impression that sometimes, when the server replies to
‘worker-request-work’ messages, its reply is received by more than just
the target worker, leading to builds being performed twice:
--8<---------------cut here---------------start------------->8---
ludo <at> berlin ~$ sudo grep lyhz5d1jb396m32dy0fs9h8vqzw95ddp /var/log/cuirass-remote-server.log
2023-12-23 00:15:29 141.80.167.184 (0LFowqzr): build started: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'.
2023-12-23 00:18:41 fetching 1 outputs of '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' from http://141.80.167.184:5558
2023-12-23 00:18:45 build succeeded: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'
2023-12-23 00:21:20 141.80.167.159 (oNzYXCv5): build started: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'.
2023-12-23 00:24:31 fetching 1 outputs of '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' from http://141.80.167.159:5558
2023-12-23 00:24:32 build succeeded: '/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv'
ludo <at> berlin ~$ sudo ssh root <at> 141.80.167.184 grep lyhz5d1jb396m32dy0fs9h8vqzw95ddp /var/log/cuirass-remote-worker.log
2023-12-23 00:12:32 0LFowqzr: building derivation `/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' (system: x86_64-linux)
2023-12-23 00:12:54 0LFowqzr: derivation /gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv build succeeded.
ludo <at> berlin ~$ sudo ssh root <at> 141.80.167.159 grep lyhz5d1jb396m32dy0fs9h8vqzw95ddp /var/log/cuirass-remote-worker.log
2023-12-23 00:17:51 oNzYXCv5: building derivation `/gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv' (system: x86_64-linux)
2023-12-23 00:18:17 oNzYXCv5: derivation /gnu/store/lyhz5d1jb396m32dy0fs9h8vqzw95ddp-cdrdao-1.2.5.drv build succeeded.
--8<---------------cut here---------------end--------------->8---
This is with Cuirass 1.2.0-1.bdc1f9f.
To be continued…
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#67988
; Package
guix
.
(Tue, 28 May 2024 21:52:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 67988 <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:
> I’m under the impression that sometimes, when the server replies to
> ‘worker-request-work’ messages, its reply is received by more than just
> the target worker, leading to builds being performed twice:
Seen again:
--8<---------------cut here---------------start------------->8---
ludo <at> guix-hpc4 ~/src/cuirass$ sudo grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-server.log
2024-05-28 21:31:43 194.199.1.26 (PajrOfGX): build started: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'.
2024-05-28 21:34:22 194.199.1.27 (exataaY9): build started: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'.
2024-05-28 21:38:32 194.199.1.17 (DIwFaVSn): build started: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'.
2024-05-28 22:16:13 fetching 1 outputs of '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' from http://194.199.1.26:5558
2024-05-28 22:16:18 build succeeded: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'
2024-05-28 22:53:49 fetching 1 outputs of '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' from http://194.199.1.27:5558
2024-05-28 22:53:49 build succeeded: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'
2024-05-28 23:03:50 fetching 1 outputs of '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' from http://194.199.1.17:5558
2024-05-28 23:03:50 build succeeded: '/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv'
--8<---------------cut here---------------end--------------->8---
And on workers:
--8<---------------cut here---------------start------------->8---
$ ssh root <at> guix-hpc3 grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-worker.log
2024-05-28 21:57:43 DIwFaVSn: building derivation `/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' (system: x86_64-linux)
2024-05-28 23:22:58 DIwFaVSn: derivation /gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv build succeeded.
$ ssh root <at> guix-hpc5 grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-worker.log
2024-05-28 21:34:13 PajrOfGX: building derivation `/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' (system: x86_64-linux)
2024-05-28 22:18:40 PajrOfGX: derivation /gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv build succeeded.
$ ssh root <at> guix-hpc7 grep nmhvrka9i4qng54w3d478j1lsp9dn7r7 /var/log/cuirass-remote-worker.log
2024-05-28 21:34:11 exataaY9: building derivation `/gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv' (system: x86_64-linux)
2024-05-28 22:53:35 exataaY9: derivation /gnu/store/nmhvrka9i4qng54w3d478j1lsp9dn7r7-firefox-126.0.1.drv build succeeded.
--8<---------------cut here---------------end--------------->8---
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#67988
; Package
guix
.
(Fri, 31 May 2024 19:56:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 67988 <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:
> I’m under the impression that sometimes, when the server replies to
> ‘worker-request-work’ messages, its reply is received by more than just
> the target worker, leading to builds being performed twice:
On closer inspection, the theory of the message being received by two
different peers doesn’t hold.
Instead, I believe ‘db-get-pending-build’ would return the same build at
two different points in time, typically while the first one is still
running.
That’s normally not possible because the build’s status is changed to
‘submitted’ once it’s been picked up. Turns out that, due to slowness
of the query in ‘db-get-pending-build’ (fixed in
17338588d4862b04e9e405c1244a2ea703b50d98), ‘remote-server’ would
sometimes fail to see worker pings in a timely fashion. Thus, it would
call ‘db-remove-unresponsive-workers’, which would reschedule builds
that were being carried out by said worker(s). And that’s how we would
end up with multiple concurrent builds of the same derivation.
I added logging in c2061ca845d05694ebeb88935a6ff2254711beb2, which
should give a hint, should that happen again.
Ludo’.
bug closed, send any further explanations to
67988 <at> debbugs.gnu.org and Ludovic Courtès <ludovic.courtes <at> inria.fr>
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Tue, 04 Jun 2024 13:58:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 03 Jul 2024 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 23 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.