GNU bug report logs -
#42548
Cuirass 504 errors
Previous Next
Reported by: Mathieu Othacehe <othacehe <at> gnu.org>
Date: Sun, 26 Jul 2020 16:11:02 UTC
Severity: normal
Done: Mathieu Othacehe <othacehe <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 42548 in the body.
You can then email your comments to 42548 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Sun, 26 Jul 2020 16:11:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Mathieu Othacehe <othacehe <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Sun, 26 Jul 2020 16:11:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
Back from holidays, perfect time to fix some Cuirass issues :) The
Cuirass web interface frequently serves 504 errors for all requests,
requiring a service restart on berlin.
Having a look to /var/log/cuirass-web.log it seems that we have indeed
multiple things going wrong.
A first problem is caused by checkout entries pointing to remove
inputs. This should be fix with f71f026a41d8e68e4a7f11ef6e708964594a599c
in Cuirass.
A second issue is caused when a build product download is started, then
aborted. In that case, sendfile throws an exception or enters an endless
loop.
There's a third issue, but the cause is not clear to me:
--8<---------------cut here---------------start------------->8---
Uncaught exception in fiber ##f:
In ice-9/boot-9.scm:
1736:10 5 (with-exception-handler _ _ #:unwind? _ # _)
In web/server/fiberized.scm:
160:26 4 (_)
In ice-9/suspendable-ports.scm:
83:4 3 (write-bytes #<closed: file 7f3a4ed46310> #vu8(60 33 ?) ?)
In unknown file:
2 (port-write #<closed: file 7f3a4ed46310> #vu8(60 33 # ?) ?)
In ice-9/boot-9.scm:
1669:16 1 (raise-exception _ #:continuable? _)
1669:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
--8<---------------cut here---------------end--------------->8---
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Mon, 27 Jul 2020 22:13:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 42548 <at> debbugs.gnu.org (full text, mbox):
Hi Mathieu,
On Sun, 26 Jul 2020 at 18:10, Mathieu Othacehe <othacehe <at> gnu.org> wrote:
> A second issue is caused when a build product download is started, then
> aborted. In that case, sendfile throws an exception or enters an endless
> loop.
What do you mean by “build product download is started, then aborted”?
Cheers,
simon
Information forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Tue, 28 Jul 2020 07:33:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 42548 <at> debbugs.gnu.org (full text, mbox):
Hey zimoun,
> What do you mean by “build product download is started, then aborted”?
Here I mean clicking on the downloadable image here[1] and then hit
"cancel" when the download popup appears, or the abort button later on,
when the download is started.
Thanks,
Mathieu
[1]: https://ci.guix.gnu.org/build/3031091/details
Information forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Tue, 28 Jul 2020 08:50:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 42548 <at> debbugs.gnu.org (full text, mbox):
Hi Mathieu,
On Tue, 28 Jul 2020 at 09:32, Mathieu Othacehe <othacehe <at> gnu.org> wrote:
> Here I mean clicking on the downloadable image here[1] and then hit
> "cancel" when the download popup appears, or the abort button later on,
> when the download is started.
Ah that’ annoying indeed. :-)
And does it mess Cuirass if the connection is lost e.g. down the
network?
Cheers,
simon
Information forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Tue, 28 Jul 2020 14:58:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 42548 <at> debbugs.gnu.org (full text, mbox):
> And does it mess Cuirass if the connection is lost e.g. down the
> network?
Not sure yet, I also found this message:
--8<---------------cut here---------------start------------->8---
Uncaught exception in fiber ##f:
In ice-9/boot-9.scm:
1736:10 5 (with-exception-handler _ _ #:unwind? _ # _)
In web/server/fiberized.scm:
160:26 4 (_)
In ice-9/suspendable-ports.scm:
83:4 3 (write-bytes #<closed: file 7ff11c2dec40> #vu8(60 33 ?) ?)
In unknown file:
2 (port-write #<closed: file 7ff11c2dec40> #vu8(60 33 # ?) ?)
In ice-9/boot-9.scm:
1669:16 1 (raise-exception _ #:continuable? _)
1669:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---
that suggests that we try to write something to a closed file.
To be investigated :)
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Thu, 30 Jul 2020 14:48:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 42548 <at> debbugs.gnu.org (full text, mbox):
Hey,
> A second issue is caused when a build product download is started, then
> aborted. In that case, sendfile throws an exception or enters an endless
> loop.
Ok, so I found a couple of errors here. First, I noticed that it was not
possible to download simultaneously two build products, because the
first download was blocking the whole process.
This is solved by: 6ad9c602697ffe33c8fbb09ccd796b74bf600223. In short,
current-fiber was set to #f, both in the context of the caller and the
spawned thread. So I think the get-message operating was blocking the
whole thread instead of suspending the current fiber. But if someone
else could take a look it would be nice :).
Second issue, sendfile may throw EPIPE or ECONNRESET if the client
disconnects before the end of the transfer. I think, besides the dirty
backtrace, it was not harmful. But anyway, its better to catch this as
we are doing in "guix publish", see:
0955a11abd9e27c96a1375cca6a1c97869b5780a.
I fear it won't be enough to fix the 504 errors, but at least it's a
start.
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#42548
; Package
guix
.
(Tue, 04 Aug 2020 16:49:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 42548 <at> debbugs.gnu.org (full text, mbox):
Hello,
> that suggests that we try to write something to a closed file.
>
> To be investigated :)
Ok, so I have a better grasp on what's going on. Cuirass web server is
receiving some requests such as "/builds/1234)" which were not rejected,
but worst, caused SQL queries such as "select * from Builds".
As the table is quite large, it caused some of the DB workers to
hang. Once all DB workers were hanging, the queries started to
accumulate until the open fd limit (1024) was reached.
I did consolidate the HTTP queries validation, and Cuirass web server is
now running since 48 hours, which has not happened in months I think.
I also added some warnings to detect DB workers hanging for more than 5
seconds. The next step is to log all SQL queries using[1]. This should
allow us to spot this kind of issues more easily. Logging the duration
of each query should also help us to optimize the queries.
I'm still waiting a few days before closing this issue.
Thanks,
Mathieu
[1]: https://notabug.org/guile-sqlite3/guile-sqlite3/pulls/16
Reply sent
to
Mathieu Othacehe <othacehe <at> gnu.org>
:
You have taken responsibility.
(Thu, 06 Aug 2020 08:18:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Mathieu Othacehe <othacehe <at> gnu.org>
:
bug acknowledged by developer.
(Thu, 06 Aug 2020 08:18:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 42548-done <at> debbugs.gnu.org (full text, mbox):
Hello,
> I'm still waiting a few days before closing this issue.
No issues so far, closing this one.
Mathieu
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 03 Sep 2020 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 207 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.