GNU bug report logs -
#43850
cuirass: inconsistent SQL queries execution time.
Previous Next
Reported by: Mathieu Othacehe <othacehe <at> gnu.org>
Date: Wed, 7 Oct 2020 16:06:02 UTC
Severity: normal
Done: Mathieu Othacehe <othacehe <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43850 in the body.
You can then email your comments to 43850 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Wed, 07 Oct 2020 16:06:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Mathieu Othacehe <othacehe <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Wed, 07 Oct 2020 16:06:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
Over the last few weeks I made sure that all Cuirass SQL queries were
using indexes. As the "Builds" and "Outputs" tables can be really large,
having queries covered by indexes is imperative for consistent queries
duration.
However, I observed that some queries have inconsistent duration.
--8<---------------cut here---------------start------------->8---
2020-10-07T17:59:09 Database worker unresponsive for 5 seconds (db-get-builds-max).
2020-10-07T17:59:13 GET /build/3183151/details
2020-10-07T17:59:13 GET /eval/92
2020-10-07T17:59:14 builds request took 3.66e-4 seconds
2020-10-07T17:59:14 Database worker unresponsive for 5 seconds (db-get-builds-max).
2020-10-07T17:59:15 GET /eval/13234
2020-10-07T17:59:16 GET /build/3146487/details
2020-10-07T17:59:19 Database worker unresponsive for 5 seconds (db-get-builds-max).
--8<---------------cut here---------------end--------------->8---
"db-get-builds-max" query has the following query plan:
--8<---------------cut here---------------start------------->8---
QUERY PLAN
|--SEARCH TABLE Builds USING INDEX Builds_stoptime_id (stoptime=?)
`--SCALAR SUBQUERY 1
`--SEARCH TABLE Builds USING INDEX Builds_evaluation_index (evaluation=?)
--8<---------------cut here---------------end--------------->8---
so it should always reasonably fast. The log seem to indicate that the
worker running one of those queries is busy for more than 10 seconds
here. My understanding of SQLite WAL mode is that writers should not
block readers. So I don't get why this query can be intermittently so
slow.
Thanks,
Mathieu
--
https://othacehe.org
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Wed, 14 Oct 2020 16:54:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hello,
I pushed and deployed several patches that:
- update metrics in a single transaction
- register builds in a single transaction
- use a single write database worker, queuing queries and submitting
them by batches (in a single transaction).
- optimize some SQLite parameters (decrease WAL size, use more RAM
caching)
this should improve the situation, even if I still observe some
inconsistent execution duration.
I still have a few improvement ideas such as pre-allocating a large
database file to decrease fragmentation and running periodic vacuums.
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Tue, 20 Oct 2020 11:46:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hello,
> this should improve the situation, even if I still observe some
> inconsistent execution duration.
I tried to use the two following pragma:
--8<---------------cut here---------------start------------->8---
PRAGMA synchronous = OFF
PRAGMA mmap_size = 10737418240
--8<---------------cut here---------------end--------------->8---
hoping that mmaping the Cuirass database and disabling synchronisation
would help, but there were still inconsistent duration.
I have now copied the database to a tmpfs mounted directory to make sure
that those inconsistent duration are only caused by the I/O pressure on
berlin.
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Thu, 22 Oct 2020 11:50:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hello,
> I have now copied the database to a tmpfs mounted directory to make sure
> that those inconsistent duration are only caused by the I/O pressure on
> berlin.
This helps a lot. The Cuirass web service has been running smooth since
two days, without any inconsistent query times.
I'm considering using a tmpfs backed database for good. The problem is
that we would need a save/restore mechanism in case Berlin
reboots.
WDYT?
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Mon, 26 Oct 2020 23:01:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hi,
Mathieu Othacehe <othacehe <at> gnu.org> skribis:
>> I have now copied the database to a tmpfs mounted directory to make sure
>> that those inconsistent duration are only caused by the I/O pressure on
>> berlin.
>
> This helps a lot. The Cuirass web service has been running smooth since
> two days, without any inconsistent query times.
Interesting.
> I'm considering using a tmpfs backed database for good. The problem is
> that we would need a save/restore mechanism in case Berlin
> reboots.
Hmm sounds risky, no?
I wonder if we could instead ensure no I/O-intensive workload runs that
machine. I’m thinking in particular of the derivations that produce
ISO/qcow images that are not offloaded but maybe should.
WDYT? Do you think that’d be enough? Or is tmpfs our only hope?
Thanks,
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Tue, 27 Oct 2020 09:29:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 43850 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hi,
>
> Mathieu Othacehe <othacehe <at> gnu.org> skribis:
>
>>> I have now copied the database to a tmpfs mounted directory to make sure
>>> that those inconsistent duration are only caused by the I/O pressure on
>>> berlin.
>>
>> This helps a lot. The Cuirass web service has been running smooth since
>> two days, without any inconsistent query times.
>
> Interesting.
>
>> I'm considering using a tmpfs backed database for good. The problem is
>> that we would need a save/restore mechanism in case Berlin
>> reboots.
>
> Hmm sounds risky, no?
>
> I wonder if we could instead ensure no I/O-intensive workload runs that
> machine. I’m thinking in particular of the derivations that produce
> ISO/qcow images that are not offloaded but maybe should.
>
> WDYT? Do you think that’d be enough? Or is tmpfs our only hope?
I think Ricardo mentioned that the machine running Cuirass uses an SSD
for the root filesystem, so moving the database there may help?
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Tue, 27 Oct 2020 14:11:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hello Chris,
> I think Ricardo mentioned that the machine running Cuirass uses an SSD
> for the root filesystem, so moving the database there may help?
Looks like the database was already on the SSD before my tmpfs
experiment.
--8<---------------cut here---------------start------------->8---
mathieu <at> berlin ~$ df -h
Filesystem Size Used Avail Use% Mounted on
none 95G 0 95G 0% /dev
/dev/sda1 916G 321G 549G 37% /
/dev/sdb1 37T 34T 2.6T 94% /gnu
tmpfs 95G 8.0K 95G 1% /dev/shm
tmpfs 10G 2.4G 7.7G 24% /var/lib/cuirass_tmpfs
--8<---------------cut here---------------end--------------->8---
I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Tue, 27 Oct 2020 18:12:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 43850 <at> debbugs.gnu.org (full text, mbox):
> I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.
Turns out /tmp is mounted on /dev/sda, so all the building and ISO
production are first written on /dev/sda before being copied to the
store in /dev/sdb.
Reducing the build activity of berlin, as Ludo proposed should help
then.
Thanks,
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Sat, 31 Oct 2020 04:34:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hello,
Mathieu Othacehe <othacehe <at> gnu.org> writes:
> Hello Chris,
>
>> I think Ricardo mentioned that the machine running Cuirass uses an SSD
>> for the root filesystem, so moving the database there may help?
>
> Looks like the database was already on the SSD before my tmpfs
> experiment.
>
> mathieu <at> berlin ~$ df -h
> Filesystem Size Used Avail Use% Mounted on
> none 95G 0 95G 0% /dev
> /dev/sda1 916G 321G 549G 37% /
> /dev/sdb1 37T 34T 2.6T 94% /gnu
> tmpfs 95G 8.0K 95G 1% /dev/shm
> tmpfs 10G 2.4G 7.7G 24% /var/lib/cuirass_tmpfs
>
> I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.
>
> Thanks,
>
> Mathieu
As an aside, running
--8<---------------cut here---------------start------------->8---
sudo sqlite3 /var/guix/db/db.sqlite vacuum
--8<---------------cut here---------------end--------------->8---
shaved off some 40 Mb from my large database file:
-rw-r--r-- 1 root root 468889600 Oct 31 00:16 db.sqlite
-rw-r--r-- 1 root root 510648320 Oct 28 23:36 db.sqlite.bak
Perhaps we should run 'vacuum' when invoking 'guix gc' or at some other
key places (where lots of data gets removed from the DB). There's also
the auto_vacuum PRAGMA, which is not enabled currently:
--8<---------------cut here---------------start------------->8---
sqlite3 /var/guix/db/db.sqlite 'pragma auto_vacuum'
0
--8<---------------cut here---------------end--------------->8---
But the later doesn't necessarily sound like a good idea:
Note, however, that auto-vacuum only truncates the freelist pages from
the file. Auto-vacuum does not defragment the database nor repack
individual database pages the way that the VACUUM command does. In fact,
because it moves pages around within the file, auto-vacuum can actually
make fragmentation worse. [0]
[0]: https://www.sqlite.org/pragma.html#pragma_auto_vacuum
Maxim
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43850
; Package
guix
.
(Sun, 01 Nov 2020 18:24:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 43850 <at> debbugs.gnu.org (full text, mbox):
Hey Maxim,
> Perhaps we should run 'vacuum' when invoking 'guix gc' or at some other
> key places (where lots of data gets removed from the DB). There's also
> the auto_vacuum PRAGMA, which is not enabled currently:
Vacuuming periodically seems important, however, it didn't resulted in
noticeable improvements during high I/O pressure when I tested it.
I still think that we should vacuum periodically, even though it means
that all database writes are postponed during the duration of the
vacuuming.
Having a separate fiber in charge of that operation could work I guess.
Thanks,
Mathieu
Reply sent
to
Mathieu Othacehe <othacehe <at> gnu.org>
:
You have taken responsibility.
(Thu, 25 Mar 2021 12:47:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Mathieu Othacehe <othacehe <at> gnu.org>
:
bug acknowledged by developer.
(Thu, 25 Mar 2021 12:47:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 43850-done <at> debbugs.gnu.org (full text, mbox):
Closing as this is not any issue anymore since the switch to PostgreSQL.
Mathieu
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 23 Apr 2021 11:24:10 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 78 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.