GNU bug report logs - #43850
cuirass: inconsistent SQL queries execution time.

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Wed, 7 Oct 2020 16:06:02 UTC

Severity: normal

Done: Mathieu Othacehe <othacehe <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43850 in the body.
You can then email your comments to 43850 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Wed, 07 Oct 2020 16:06:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mathieu Othacehe <othacehe <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Wed, 07 Oct 2020 16:06:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: cuirass: inconsistent SQL queries execution time.
Date: Wed, 07 Oct 2020 18:04:52 +0200
Hello,

Over the last few weeks I made sure that all Cuirass SQL queries were
using indexes. As the "Builds" and "Outputs" tables can be really large,
having queries covered by indexes is imperative for consistent queries
duration.

However, I observed that some queries have inconsistent duration.

--8<---------------cut here---------------start------------->8---
2020-10-07T17:59:09 Database worker unresponsive for 5 seconds (db-get-builds-max).
2020-10-07T17:59:13 GET /build/3183151/details
2020-10-07T17:59:13 GET /eval/92
2020-10-07T17:59:14 builds request took 3.66e-4 seconds
2020-10-07T17:59:14 Database worker unresponsive for 5 seconds (db-get-builds-max).
2020-10-07T17:59:15 GET /eval/13234
2020-10-07T17:59:16 GET /build/3146487/details
2020-10-07T17:59:19 Database worker unresponsive for 5 seconds (db-get-builds-max).
--8<---------------cut here---------------end--------------->8---

"db-get-builds-max" query has the following query plan:

--8<---------------cut here---------------start------------->8---
QUERY PLAN
|--SEARCH TABLE Builds USING INDEX Builds_stoptime_id (stoptime=?)
`--SCALAR SUBQUERY 1
   `--SEARCH TABLE Builds USING INDEX Builds_evaluation_index (evaluation=?)
--8<---------------cut here---------------end--------------->8---

so it should always reasonably fast. The log seem to indicate that the
worker running one of those queries is busy for more than 10 seconds
here. My understanding of SQLite WAL mode is that writers should not
block readers. So I don't get why this query can be intermittently so
slow.

Thanks,

Mathieu
-- 
https://othacehe.org




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Wed, 14 Oct 2020 16:54:02 GMT) Full text and rfc822 format available.

Message #8 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43850 <at> debbugs.gnu.org
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Wed, 14 Oct 2020 18:53:20 +0200
Hello,

I pushed and deployed several patches that:

- update metrics in a single transaction
- register builds in a single transaction
- use a single write database worker, queuing queries and submitting
them by batches (in a single transaction).
- optimize some SQLite parameters (decrease WAL size, use more RAM
caching)

this should improve the situation, even if I still observe some
inconsistent execution duration.

I still have a few improvement ideas such as pre-allocating a large
database file to decrease fragmentation and running periodic vacuums.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Tue, 20 Oct 2020 11:46:01 GMT) Full text and rfc822 format available.

Message #11 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43850 <at> debbugs.gnu.org
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Tue, 20 Oct 2020 13:45:12 +0200
Hello,

> this should improve the situation, even if I still observe some
> inconsistent execution duration.

I tried to use the two following pragma:

--8<---------------cut here---------------start------------->8---
PRAGMA synchronous = OFF
PRAGMA mmap_size = 10737418240
--8<---------------cut here---------------end--------------->8---

hoping that mmaping the Cuirass database and disabling synchronisation
would help, but there were still inconsistent duration.

I have now copied the database to a tmpfs mounted directory to make sure
that those inconsistent duration are only caused by the I/O pressure on
berlin.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Thu, 22 Oct 2020 11:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43850 <at> debbugs.gnu.org
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Thu, 22 Oct 2020 13:49:43 +0200
Hello,

> I have now copied the database to a tmpfs mounted directory to make sure
> that those inconsistent duration are only caused by the I/O pressure on
> berlin.

This helps a lot. The Cuirass web service has been running smooth since
two days, without any inconsistent query times.

I'm considering using a tmpfs backed database for good. The problem is
that we would need a save/restore mechanism in case Berlin
reboots.

WDYT?

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Mon, 26 Oct 2020 23:01:02 GMT) Full text and rfc822 format available.

Message #17 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 43850 <at> debbugs.gnu.org
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Mon, 26 Oct 2020 23:59:51 +0100
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>> I have now copied the database to a tmpfs mounted directory to make sure
>> that those inconsistent duration are only caused by the I/O pressure on
>> berlin.
>
> This helps a lot. The Cuirass web service has been running smooth since
> two days, without any inconsistent query times.

Interesting.

> I'm considering using a tmpfs backed database for good. The problem is
> that we would need a save/restore mechanism in case Berlin
> reboots.

Hmm sounds risky, no?

I wonder if we could instead ensure no I/O-intensive workload runs that
machine.  I’m thinking in particular of the derivations that produce
ISO/qcow images that are not offloaded but maybe should.

WDYT?  Do you think that’d be enough?  Or is tmpfs our only hope?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Tue, 27 Oct 2020 09:29:01 GMT) Full text and rfc822 format available.

Message #20 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Ludovic Courtès <ludo <at> gnu.org>, Mathieu Othacehe
 <othacehe <at> gnu.org>
Cc: 43850 <at> debbugs.gnu.org
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Tue, 27 Oct 2020 09:28:54 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi,
>
> Mathieu Othacehe <othacehe <at> gnu.org> skribis:
>
>>> I have now copied the database to a tmpfs mounted directory to make sure
>>> that those inconsistent duration are only caused by the I/O pressure on
>>> berlin.
>>
>> This helps a lot. The Cuirass web service has been running smooth since
>> two days, without any inconsistent query times.
>
> Interesting.
>
>> I'm considering using a tmpfs backed database for good. The problem is
>> that we would need a save/restore mechanism in case Berlin
>> reboots.
>
> Hmm sounds risky, no?
>
> I wonder if we could instead ensure no I/O-intensive workload runs that
> machine.  I’m thinking in particular of the derivations that produce
> ISO/qcow images that are not offloaded but maybe should.
>
> WDYT?  Do you think that’d be enough?  Or is tmpfs our only hope?

I think Ricardo mentioned that the machine running Cuirass uses an SSD
for the root filesystem, so moving the database there may help?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Tue, 27 Oct 2020 14:11:02 GMT) Full text and rfc822 format available.

Message #23 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 43850 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Tue, 27 Oct 2020 15:10:03 +0100
Hello Chris,

> I think Ricardo mentioned that the machine running Cuirass uses an SSD
> for the root filesystem, so moving the database there may help?

Looks like the database was already on the SSD before my tmpfs
experiment.

--8<---------------cut here---------------start------------->8---
mathieu <at> berlin ~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
none             95G     0   95G   0% /dev
/dev/sda1       916G  321G  549G  37% /
/dev/sdb1        37T   34T  2.6T  94% /gnu
tmpfs            95G  8.0K   95G   1% /dev/shm
tmpfs            10G  2.4G  7.7G  24% /var/lib/cuirass_tmpfs
--8<---------------cut here---------------end--------------->8---

I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Tue, 27 Oct 2020 18:12:02 GMT) Full text and rfc822 format available.

Message #26 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 43850 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Tue, 27 Oct 2020 19:11:28 +0100
> I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.

Turns out /tmp is mounted on /dev/sda, so all the building and ISO
production are first written on /dev/sda before being copied to the
store in /dev/sdb.

Reducing the build activity of berlin, as Ludo proposed should help
then.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Sat, 31 Oct 2020 04:34:02 GMT) Full text and rfc822 format available.

Message #29 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 43850 <at> debbugs.gnu.org, Christopher Baines <mail <at> cbaines.net>
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Sat, 31 Oct 2020 00:33:08 -0400
Hello,

Mathieu Othacehe <othacehe <at> gnu.org> writes:

> Hello Chris,
>
>> I think Ricardo mentioned that the machine running Cuirass uses an SSD
>> for the root filesystem, so moving the database there may help?
>
> Looks like the database was already on the SSD before my tmpfs
> experiment.
>
> mathieu <at> berlin ~$ df -h
> Filesystem      Size  Used Avail Use% Mounted on
> none             95G     0   95G   0% /dev
> /dev/sda1       916G  321G  549G  37% /
> /dev/sdb1        37T   34T  2.6T  94% /gnu
> tmpfs            95G  8.0K   95G   1% /dev/shm
> tmpfs            10G  2.4G  7.7G  24% /var/lib/cuirass_tmpfs
>
> I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.
>
> Thanks,
>
> Mathieu

As an aside, running

--8<---------------cut here---------------start------------->8---
sudo sqlite3 /var/guix/db/db.sqlite vacuum
--8<---------------cut here---------------end--------------->8---

shaved off some 40 Mb from my large database file:

-rw-r--r-- 1 root root 468889600 Oct 31 00:16 db.sqlite
-rw-r--r-- 1 root root 510648320 Oct 28 23:36 db.sqlite.bak

Perhaps we should run 'vacuum' when invoking 'guix gc' or at some other
key places (where lots of data gets removed from the DB).  There's also
the auto_vacuum PRAGMA, which is not enabled currently:

--8<---------------cut here---------------start------------->8---
sqlite3 /var/guix/db/db.sqlite 'pragma auto_vacuum'
0
--8<---------------cut here---------------end--------------->8---

But the later doesn't necessarily sound like a good idea:

   Note, however, that auto-vacuum only truncates the freelist pages from
   the file. Auto-vacuum does not defragment the database nor repack
   individual database pages the way that the VACUUM command does. In fact,
   because it moves pages around within the file, auto-vacuum can actually
   make fragmentation worse. [0]

[0]:  https://www.sqlite.org/pragma.html#pragma_auto_vacuum

Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#43850; Package guix. (Sun, 01 Nov 2020 18:24:02 GMT) Full text and rfc822 format available.

Message #32 received at 43850 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 43850 <at> debbugs.gnu.org, Christopher Baines <mail <at> cbaines.net>
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Sun, 01 Nov 2020 19:23:14 +0100
Hey Maxim,

> Perhaps we should run 'vacuum' when invoking 'guix gc' or at some other
> key places (where lots of data gets removed from the DB).  There's also
> the auto_vacuum PRAGMA, which is not enabled currently:

Vacuuming periodically seems important, however, it didn't resulted in
noticeable improvements during high I/O pressure when I tested it.

I still think that we should vacuum periodically, even though it means
that all database writes are postponed during the duration of the
vacuuming.

Having a separate fiber in charge of that operation could work I guess.

Thanks,

Mathieu




Reply sent to Mathieu Othacehe <othacehe <at> gnu.org>:
You have taken responsibility. (Thu, 25 Mar 2021 12:47:02 GMT) Full text and rfc822 format available.

Notification sent to Mathieu Othacehe <othacehe <at> gnu.org>:
bug acknowledged by developer. (Thu, 25 Mar 2021 12:47:02 GMT) Full text and rfc822 format available.

Message #37 received at 43850-done <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43850-done <at> debbugs.gnu.org
Subject: Re: bug#43850: cuirass: inconsistent SQL queries execution time.
Date: Thu, 25 Mar 2021 13:46:25 +0100
Closing as this is not any issue anymore since the switch to PostgreSQL.

Mathieu




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 23 Apr 2021 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 77 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.