GNU bug report logs - #42548
Cuirass 504 errors

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Sun, 26 Jul 2020 16:11:02 UTC

Severity: normal

Done: Mathieu Othacehe <othacehe <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 42548 in the body.
You can then email your comments to 42548 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Sun, 26 Jul 2020 16:11:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mathieu Othacehe <othacehe <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 26 Jul 2020 16:11:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: Cuirass 504 errors
Date: Sun, 26 Jul 2020 18:10:55 +0200
Hello,

Back from holidays, perfect time to fix some Cuirass issues :) The
Cuirass web interface frequently serves 504 errors for all requests,
requiring a service restart on berlin.

Having a look to /var/log/cuirass-web.log it seems that we have indeed
multiple things going wrong.

A first problem is caused by checkout entries pointing to remove
inputs. This should be fix with f71f026a41d8e68e4a7f11ef6e708964594a599c
in Cuirass.

A second issue is caused when a build product download is started, then
aborted. In that case, sendfile throws an exception or enters an endless
loop.

There's a third issue, but the cause is not clear to me:

--8<---------------cut here---------------start------------->8---
Uncaught exception in fiber ##f:
In ice-9/boot-9.scm:
  1736:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In web/server/fiberized.scm:
   160:26  4 (_)
In ice-9/suspendable-ports.scm:
     83:4  3 (write-bytes #<closed: file 7f3a4ed46310> #vu8(60 33 ?) ?)
In unknown file:
           2 (port-write #<closed: file 7f3a4ed46310> #vu8(60 33 # ?) ?)
In ice-9/boot-9.scm:
  1669:16  1 (raise-exception _ #:continuable? _)
  1669:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
--8<---------------cut here---------------end--------------->8---

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Mon, 27 Jul 2020 22:13:02 GMT) Full text and rfc822 format available.

Message #8 received at 42548 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Mathieu Othacehe <othacehe <at> gnu.org>, 42548 <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Tue, 28 Jul 2020 00:11:55 +0200
Hi Mathieu,

On Sun, 26 Jul 2020 at 18:10, Mathieu Othacehe <othacehe <at> gnu.org> wrote:

> A second issue is caused when a build product download is started, then
> aborted. In that case, sendfile throws an exception or enters an endless
> loop.

What do you mean by “build product download is started, then aborted”?

Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Tue, 28 Jul 2020 07:33:02 GMT) Full text and rfc822 format available.

Message #11 received at 42548 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 42548 <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Tue, 28 Jul 2020 09:32:09 +0200
Hey zimoun,

> What do you mean by “build product download is started, then aborted”?

Here I mean clicking on the downloadable image here[1] and then hit
"cancel" when the download popup appears, or the abort button later on,
when the download is started.

Thanks,

Mathieu

[1]: https://ci.guix.gnu.org/build/3031091/details




Information forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Tue, 28 Jul 2020 08:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 42548 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 42548 <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Tue, 28 Jul 2020 10:49:43 +0200
Hi Mathieu,

On Tue, 28 Jul 2020 at 09:32, Mathieu Othacehe <othacehe <at> gnu.org> wrote:

> Here I mean clicking on the downloadable image here[1] and then hit
> "cancel" when the download popup appears, or the abort button later on,
> when the download is started.

Ah that’ annoying indeed. :-)

And does it mess Cuirass if the connection is lost e.g. down the
network?

Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Tue, 28 Jul 2020 14:58:01 GMT) Full text and rfc822 format available.

Message #17 received at 42548 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 42548 <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Tue, 28 Jul 2020 16:56:57 +0200
> And does it mess Cuirass if the connection is lost e.g. down the
> network?

Not sure yet, I also found this message:

--8<---------------cut here---------------start------------->8---
Uncaught exception in fiber ##f:
In ice-9/boot-9.scm:
  1736:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In web/server/fiberized.scm:
   160:26  4 (_)
In ice-9/suspendable-ports.scm:
     83:4  3 (write-bytes #<closed: file 7ff11c2dec40> #vu8(60 33 ?) ?)
In unknown file:
           2 (port-write #<closed: file 7ff11c2dec40> #vu8(60 33 # ?) ?)
In ice-9/boot-9.scm:
  1669:16  1 (raise-exception _ #:continuable? _)
  1669:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

that suggests that we try to write something to a closed file.

To be investigated :)

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Thu, 30 Jul 2020 14:48:01 GMT) Full text and rfc822 format available.

Message #20 received at 42548 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 42548 <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Thu, 30 Jul 2020 16:47:12 +0200
Hey,

> A second issue is caused when a build product download is started, then
> aborted. In that case, sendfile throws an exception or enters an endless
> loop.

Ok, so I found a couple of errors here. First, I noticed that it was not
possible to download simultaneously two build products, because the
first download was blocking the whole process.

This is solved by: 6ad9c602697ffe33c8fbb09ccd796b74bf600223. In short,
current-fiber was set to #f, both in the context of the caller and the
spawned thread. So I think the get-message operating was blocking the
whole thread instead of suspending the current fiber. But if someone
else could take a look it would be nice :).

Second issue, sendfile may throw EPIPE or ECONNRESET if the client
disconnects before the end of the transfer. I think, besides the dirty
backtrace, it was not harmful. But anyway, its better to catch this as
we are doing in "guix publish", see:
0955a11abd9e27c96a1375cca6a1c97869b5780a.

I fear it won't be enough to fix the 504 errors, but at least it's a
start.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#42548; Package guix. (Tue, 04 Aug 2020 16:49:01 GMT) Full text and rfc822 format available.

Message #23 received at 42548 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 42548 <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Tue, 04 Aug 2020 18:48:24 +0200
Hello,

> that suggests that we try to write something to a closed file.
>
> To be investigated :)

Ok, so I have a better grasp on what's going on. Cuirass web server is
receiving some requests such as "/builds/1234)" which were not rejected,
but worst, caused SQL queries such as "select * from Builds".

As the table is quite large, it caused some of the DB workers to
hang. Once all DB workers were hanging, the queries started to
accumulate until the open fd limit (1024) was reached.

I did consolidate the HTTP queries validation, and Cuirass web server is
now running since 48 hours, which has not happened in months I think.

I also added some warnings to detect DB workers hanging for more than 5
seconds. The next step is to log all SQL queries using[1]. This should
allow us to spot this kind of issues more easily. Logging the duration
of each query should also help us to optimize the queries.

I'm still waiting a few days before closing this issue.

Thanks,

Mathieu

[1]: https://notabug.org/guile-sqlite3/guile-sqlite3/pulls/16 




Reply sent to Mathieu Othacehe <othacehe <at> gnu.org>:
You have taken responsibility. (Thu, 06 Aug 2020 08:18:02 GMT) Full text and rfc822 format available.

Notification sent to Mathieu Othacehe <othacehe <at> gnu.org>:
bug acknowledged by developer. (Thu, 06 Aug 2020 08:18:02 GMT) Full text and rfc822 format available.

Message #28 received at 42548-done <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 42548-done <at> debbugs.gnu.org
Subject: Re: bug#42548: Cuirass 504 errors
Date: Thu, 06 Aug 2020 10:16:52 +0200
Hello,

> I'm still waiting a few days before closing this issue.

No issues so far, closing this one.

Mathieu




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 03 Sep 2020 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 207 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.