GNU bug report logs - #43564
cuirass: Contention while registering new builds.

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Tue, 22 Sep 2020 16:53:01 UTC

Severity: normal

Done: Mathieu Othacehe <othacehe <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43564 in the body.
You can then email your comments to 43564 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Tue, 22 Sep 2020 16:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mathieu Othacehe <othacehe <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Tue, 22 Sep 2020 16:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: cuirass: Contention while registering new builds.
Date: Tue, 22 Sep 2020 18:52:39 +0200
[Message part 1 (text/plain, inline)]
Hello,

The attached screenshot shows that 9 evaluations are currently "In
progress" for "guix-master" specification.

Evaluations 16725 to 16738 are completed, 7 hours ago and 56 minutes ago
respectively. They are still shown as "In progress" because the
"register" phase in "build-packages" is not over. It is also possible to
see that new builds are registered for those evaluations, but very
slowly.

As each evaluation has to register around 50k builds, there might be
some sort of "writing" contention on the database, explaining the long
build registration time.

Thanks,

Mathieu
-- 
https://othacehe.org
[cuirass.png (image/png, attachment)]

Reply sent to Mathieu Othacehe <othacehe <at> gnu.org>:
You have taken responsibility. (Mon, 28 Sep 2020 07:52:01 GMT) Full text and rfc822 format available.

Notification sent to Mathieu Othacehe <othacehe <at> gnu.org>:
bug acknowledged by developer. (Mon, 28 Sep 2020 07:52:01 GMT) Full text and rfc822 format available.

Message #10 received at 43564-done <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43564-done <at> debbugs.gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Mon, 28 Sep 2020 09:51:38 +0200
Hello,

> As each evaluation has to register around 50k builds, there might be
> some sort of "writing" contention on the database, explaining the long
> build registration time.

Turns out, once an evaluation is over, new builds are registered. This
registration tries to insert a new build for each derivation returned by
the evaluation phase. If the new build does not add a new output, the
insertion query is then rollbacked. This means that there are at least
as many insertion queries as new derivations.

SQlite allows at most one writer at a time, and even though we are using WAL
mode, performing a lot of insertions will reduce the reading performances.
When multiple evaluations are performed in parallel, the large number of
concurrent insertion queries also causes contention.

Having 10 pending evaluations means that we are probably trying to
insert around 500.000 records concurrently. This is something that
SQLite does not seem to be designed for.

To avoid those issues, we need to check first in the "Outputs" table
which derivations are already registered. This means that most of the
insertion queries will be replaced by reading queries, that are much
less expensive and more suitable for Cuirass concurrent implementation.

In the best case of one pending evaluation, the registration duration is
reduced from 1800 seconds to 320 seconds. I think that the gain is way
larger when there are multiple pending evaluations.

Pushed a fix as 461e07e14e1c8013343c0a2cb26c0e022e10d5e4.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Mon, 28 Sep 2020 08:18:01 GMT) Full text and rfc822 format available.

Message #13 received at 43564 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43564 <at> debbugs.gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Mon, 28 Sep 2020 10:17:34 +0200
For future reference, here's a small addition. As I said, we are using
the Cuirass SQLite database in WAL mode. This means that insertion
queries are added to a separate cuirass.db-wal file. Once in a while,
this file is supposed to be merged in the main cuirass.db file.

This mechanism known as checkpointing is supposed to occur automatically
once 4MB of data are added to the WAL file. On berlin, the WAL file of
Cuirass is 9.4M and the Guix WAL file is almost 1G.

As reading from the WAL file is less effective, and checkpointing a
large file will end-up locking the database for some times, we should
definitely try to understand why the Guix WAL file get so big.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Tue, 29 Sep 2020 15:03:01 GMT) Full text and rfc822 format available.

Message #16 received at 43564 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43564 <at> debbugs.gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Tue, 29 Sep 2020 17:02:14 +0200
Hello,

> In the best case of one pending evaluation, the registration duration is
> reduced from 1800 seconds to 320 seconds. I think that the gain is way
> larger when there are multiple pending evaluations.
>
> Pushed a fix as 461e07e14e1c8013343c0a2cb26c0e022e10d5e4.

While this improves the situation, another evaluation contention episode
occurred this afternoon (evaluations 16920 to 16938).

Even though reading queries are cheaper, I guess that when 10 fibers try
to execute 50k read queries, there's some contention in the
communication with the 4 database workers.

I'm trying locally to spawn database workers dedicated to registration,
that execute the evaluation queries in a single batch. Let's see if it
helps.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Sun, 04 Oct 2020 11:13:01 GMT) Full text and rfc822 format available.

Message #19 received at 43564 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 43564 <at> debbugs.gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Sun, 04 Oct 2020 13:12:46 +0200
Hello,

> I'm trying locally to spawn database workers dedicated to registration,
> that execute the evaluation queries in a single batch. Let's see if it
> helps.

I added 4 registration dedicated database workers and removed from
registration all accesses to the store. Now the evaluation returns
objects that contain all required information.

This seem to improve the situation quite a lot. For example, those three
registrations were done simultaneously in ~200 seconds:

--8<---------------cut here---------------start------------->8---
2020-10-04T12:46:19 registration took 205.429772 seconds
2020-10-04T12:46:19 evaluation 17099 registered 156 new derivations
2020-10-04T12:46:19 building 156 derivations in batches of 200
2020-10-04T12:46:19 building batch of 200 derivations (0/156)
2020-10-04T12:46:19 registration took 212.892482 seconds
2020-10-04T12:46:19 evaluation 17105 registered 164 new derivations
2020-10-04T12:46:19 building 164 derivations in batches of 200
2020-10-04T12:46:19 building batch of 200 derivations (0/164)
2020-10-04T12:46:19 registration took 205.009416 seconds
2020-10-04T12:46:19 evaluation 17102 registered 160 new derivations
2020-10-04T12:46:19 building 160 derivations in batches of 200
2020-10-04T12:46:19 building batch of 200 derivations (0/160)
--8<---------------cut here---------------end--------------->8---

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Mon, 05 Oct 2020 12:08:02 GMT) Full text and rfc822 format available.

Message #22 received at 43564 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 43564 <at> debbugs.gnu.org
Cc: othacehe <at> gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Mon, 05 Oct 2020 14:07:44 +0200
Hello!

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> Turns out, once an evaluation is over, new builds are registered. This
> registration tries to insert a new build for each derivation returned by
> the evaluation phase. If the new build does not add a new output, the
> insertion query is then rollbacked. This means that there are at least
> as many insertion queries as new derivations.

Isn’t that the real problem, that we’re doing one transaction per
derivation?

Ideally, upon evaluation completion, I’d expect us to create a single
transaction that registers all 50K derivations at once, no?

Thanks for the great improvements in this area!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Mon, 05 Oct 2020 13:10:02 GMT) Full text and rfc822 format available.

Message #25 received at 43564 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 43564 <at> debbugs.gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Mon, 05 Oct 2020 15:09:06 +0200
Hey Ludo,

> Isn’t that the real problem, that we’re doing one transaction per
> derivation?

Is it really better in term of performance to send batch of queries
within a single transaction? I haven't tried it yet.

I think that the real bottleneck was having N fibers fighting over 4
workers to execute large number of insertions. Having all those queries
done by workers dedicated to registration in a single pass seems to
improve a lot the situation.

Maybe having those registration workers posting unique transactions
containing all the queries would be even more beneficial.

Thanks,

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#43564; Package guix. (Fri, 09 Oct 2020 08:00:02 GMT) Full text and rfc822 format available.

Message #28 received at 43564 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 43564 <at> debbugs.gnu.org
Subject: Re: bug#43564: cuirass: Contention while registering new builds.
Date: Fri, 09 Oct 2020 09:59:31 +0200
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>> Isn’t that the real problem, that we’re doing one transaction per
>> derivation?
>
> Is it really better in term of performance to send batch of queries
> within a single transaction? I haven't tried it yet.

I’m definitely not a database expert, but I think there’s overhead for
setting up a transaction, which we’re paying N times here.

> I think that the real bottleneck was having N fibers fighting over 4
> workers to execute large number of insertions. Having all those queries
> done by workers dedicated to registration in a single pass seems to
> improve a lot the situation.

Then that’s great, let’s profit!  :-)

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 06 Nov 2020 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 144 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.