GNU bug report logs - #36630
[PATCH] guix: parallelize building the manual-database

Previous Next

Package: guix-patches;

Reported by: arne_bab <at> web.de

Date: Fri, 12 Jul 2019 21:44:01 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36630 in the body.
You can then email your comments to 36630 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Fri, 12 Jul 2019 21:44:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to arne_bab <at> web.de:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Fri, 12 Jul 2019 21:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: guix-patches <at> gnu.org
Cc: Arne Babenhauserheide <arne_bab <at> web.de>
Subject: [PATCH] guix: parallelize building the manual-database
Date: Fri, 12 Jul 2019 23:42:45 +0200
* guix/profiles.scm (manual-database): par-map over the entries.  This
distributes the load roughly equally over all cores and avoids blocking on
I/O.  The order of the entries stays the same since write-mandb-database sorts
them.
---
 guix/profiles.scm | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/guix/profiles.scm b/guix/profiles.scm
index f5c863945c..374f0f8a90 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -1312,15 +1312,11 @@ the entries in MANIFEST."
         #~(begin
             (use-modules (guix man-db)
                          (guix build utils)
+                         (ice-9 threads)
                          (srfi srfi-1)
                          (srfi srfi-19))

-            (define (compute-entries)
-              ;; This is the most expensive part (I/O and CPU, due to
-              ;; decompression), so report progress as we traverse INPUTS.
-              (let* ((inputs '#$(manifest-inputs manifest))
-                     (total  (length inputs)))
-                (append-map (lambda (directory count)
+            (define (compute-entry directory count total)
                               (format #t "\r[~3d/~3d] building list of \
 man-db entries..."
                                       count total)
@@ -1330,8 +1326,16 @@ man-db entries..."
                                 (if (directory-exists? man)
                                     (mandb-entries man)
                                     '())))
-                            inputs
-                            (iota total 1))))
+
+            (define (compute-entries)
+              ;; This is the most expensive part (I/O and CPU, due to
+              ;; decompression), so report progress as we traverse INPUTS.
+              (let* ((inputs '#$(manifest-inputs manifest))
+                     (total  (length inputs)))
+                (apply append (par-map compute-entry
+                                       inputs
+                                       (iota total 1)
+                                       (make-list total total)))))

             (define man-directory
               (string-append #$output "/share/man"))
--
2.22.0





Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Mon, 15 Jul 2019 16:13:06 GMT) Full text and rfc822 format available.

Message #8 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arne Babenhauserheide <arne_bab <at> web.de>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Mon, 15 Jul 2019 18:12:27 +0200
Hi Arne,

Arne Babenhauserheide <arne_bab <at> web.de> skribis:

> * guix/profiles.scm (manual-database): par-map over the entries.  This
> distributes the load roughly equally over all cores and avoids blocking on
> I/O.  The order of the entries stays the same since write-mandb-database sorts
> them.

I would think the whole process is largely I/O-bound.  Did you try
measuring differences?

I picked the manual-database derivation returned for:

  guix environment --ad-hoc jupyter python-ipython python-ipykernel -n

(It has 3,046 entries.)

On a SSD and with a hot cache, on my 4-core laptop, I get 74s with
‘master’, and 53s with this patch.

However, it will definitely not scale linearly, so we should probably
cap at 2 or 4 threads.  WDYT?

Another issue with the patch is that the [n/total] counter does not grow
monotically now: it might temporally go backwards.  Consequently, at
-v1, users will see a progress bar that hesitates and occasionally goes
backward, which isn’t great.

This would need to fix it with a mutex-protected global counter.

All in all, I’m not sure this is worth the complexity.

WDYT?

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Mon, 15 Jul 2019 23:34:01 GMT) Full text and rfc822 format available.

Message #11 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Tue, 16 Jul 2019 01:32:46 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo’,

Ludovic Courtès <ludo <at> gnu.org> writes:
>> * guix/profiles.scm (manual-database): par-map over the entries.  This
>> distributes the load roughly equally over all cores and avoids blocking on
>> I/O.  The order of the entries stays the same since write-mandb-database sorts
>> them.
>
> I would think the whole process is largely I/O-bound.  Did you try
> measuring differences?

I did not measure the difference in build-time, but I did check the
system load. Without this patch, one of my cores is under full
load. With this patch all 12 hyperthreads have a mean load of 50%.

> I picked the manual-database derivation returned for:
>   guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
> (It has 3,046 entries.)

How exactly did you run the derivation? I’d like to check it if you can
give me the exact commandline to run (a command I can run repeatedly).


> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with
> ‘master’, and 53s with this patch.

I’m using a machine with 6 physical cores, hyperthreading, and an NVMe
M.2 disk, so it is likely that it would not be disk-bound for me at 4
threads.

> However, it will definitely not scale linearly, so we should probably
> cap at 2 or 4 threads.  WDYT?

Looking at the underlying action, this seems to be a task that scales
pretty well. It just unpacks files into the disk-cache.

It should also not consume much memory, so I don’t see a reason to
artificially limit the number of threads.

> Another issue with the patch is that the [n/total] counter does not grow
> monotically now: it might temporally go backwards.  Consequently, at
> -v1, users will see a progress bar that hesitates and occasionally goes
> backward, which isn’t great.

It typically jumps forward in the beginning and then stalls until the
first manual page is finished.

Since par-map uses a global queue of futures to process, and since the
output is the first part of (compute-entry …), I don’t expect the
progress to move backwards in ways a user sees: It could only move
backwards during the initial step where all threads start at the same
time, and there the initial output should be overwritten fast enough to
not be noticeable.

> This would need to fix it with a mutex-protected global counter.

A global counter would be pretty bad for scaling. As it is, this code
needs no communication between processes besides returning the final
result, so it behaves exactly like a normal map, aside from being
faster. So I’d prefer to accept the forward-jumping.

> All in all, I’m not sure this is worth the complexity.
>
> WDYT?

Given that building manual pages is the most timeconsuming part when
installing a small tool into my profile, I think it is worth the
complexity. Especially because most of the complexity is being taken
care of by (ice-9 threads par-map).

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Tue, 16 Jul 2019 21:15:01 GMT) Full text and rfc822 format available.

Message #14 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arne Babenhauserheide <arne_bab <at> web.de>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Tue, 16 Jul 2019 23:14:48 +0200
Hello,

Arne Babenhauserheide <arne_bab <at> web.de> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

>> I picked the manual-database derivation returned for:
>>   guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
>> (It has 3,046 entries.)
>
> How exactly did you run the derivation? I’d like to check it if you can
> give me the exact commandline to run (a command I can run repeatedly).

If you run the command above, it’ll list
/gnu/store/…-manual-database.drv.  So you can just run:

  guix build /gnu/store/…-manual-database.drv

or:

  guix build /gnu/store/…-manual-database.drv --check

if it had already been built before.

>> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with
>> ‘master’, and 53s with this patch.
>
> I’m using a machine with 6 physical cores, hyperthreading, and an NVMe
> M.2 disk, so it is likely that it would not be disk-bound for me at 4
> threads.

The result may be entirely different with a spinning disk.  :-)

I’m not saying we should optimize for spinning disks, just that what you
see is at one end of the spectrum.

>> However, it will definitely not scale linearly, so we should probably
>> cap at 2 or 4 threads.  WDYT?
>
> Looking at the underlying action, this seems to be a task that scales
> pretty well. It just unpacks files into the disk-cache.
>
> It should also not consume much memory, so I don’t see a reason to
> artificially limit the number of threads.

On a many-core machine like we have in our build farm, with spinning
disks, I believe that using one thread per core would be
counterproductive.

>> Another issue with the patch is that the [n/total] counter does not grow
>> monotically now: it might temporally go backwards.  Consequently, at
>> -v1, users will see a progress bar that hesitates and occasionally goes
>> backward, which isn’t great.
>
> It typically jumps forward in the beginning and then stalls until the
> first manual page is finished.
>
> Since par-map uses a global queue of futures to process, and since the
> output is the first part of (compute-entry …), I don’t expect the
> progress to move backwards in ways a user sees: It could only move
> backwards during the initial step where all threads start at the same
> time, and there the initial output should be overwritten fast enough to
> not be noticeable.

Hmm, maybe.  I’m sure we’ll get reports saying this looks weird and
Something Must Absolutely Be Done About It.  :-)

But anyway, another issue is that we would need to honor
‘parallel-job-count’, which means using ‘n-par-map’, which doesn’t use
futures.

> Given that building manual pages is the most timeconsuming part when
> installing a small tool into my profile, I think it is worth the
> complexity. Especially because most of the complexity is being taken
> care of by (ice-9 threads par-map).

Just today I realized that the example above (with Jupyter) has so many
entries because of propagated inputs; in particular libxext along brings
1,000+ man pages.  We should definitely do something about these
packages.

Needs more thought…

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Wed, 17 Jul 2019 22:07:02 GMT) Full text and rfc822 format available.

Message #17 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Thu, 18 Jul 2019 00:06:36 +0200
[Message part 1 (text/plain, inline)]
Hi,

Ludovic Courtès <ludo <at> gnu.org> skribis:
> Arne Babenhauserheide <arne_bab <at> web.de> skribis:
Offtopic: I love reading Esperanto here!
>> Ludovic Courtès <ludo <at> gnu.org> writes:

>>>   guix environment --ad-hoc jupyter python-ipython python-ipykernel -n

>> How exactly did you run the derivation?

> If you run the command above, it’ll list
> /gnu/store/…-manual-database.drv.  So you can just run:
>
>   guix build /gnu/store/…-manual-database.drv
>
> or:
>
>   guix build /gnu/store/…-manual-database.drv --check
>
> if it had already been built before.

Somehow I can’t get guix to actually run my changed code with this
command, so I’m not sure I tested the right thing.

What is the clean approach to run the profile.scm from git?

>>> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with
>>> ‘master’, and 53s with this patch.
>>
>> I’m using a machine with 6 physical cores, hyperthreading, and an NVMe
>> M.2 disk, so it is likely that it would not be disk-bound for me at 4
>> threads.
>
> The result may be entirely different with a spinning disk.  :-)
>
> I’m not saying we should optimize for spinning disks, just that what you
> see is at one end of the spectrum.

That’s right, yes.

> But anyway, another issue is that we would need to honor
> ‘parallel-job-count’, which means using ‘n-par-map’, which doesn’t use
> futures.

Ouch, yes. That’s an issue … 

Thank you for bringing it up!

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Thu, 18 Jul 2019 08:59:01 GMT) Full text and rfc822 format available.

Message #20 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arne Babenhauserheide <arne_bab <at> web.de>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Thu, 18 Jul 2019 10:57:30 +0200
Hello,

Arne Babenhauserheide <arne_bab <at> web.de> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>> Arne Babenhauserheide <arne_bab <at> web.de> skribis:
> Offtopic: I love reading Esperanto here!
>>> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>>>>   guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
>
>>> How exactly did you run the derivation?
>
>> If you run the command above, it’ll list
>> /gnu/store/…-manual-database.drv.  So you can just run:
>>
>>   guix build /gnu/store/…-manual-database.drv
>>
>> or:
>>
>>   guix build /gnu/store/…-manual-database.drv --check
>>
>> if it had already been built before.
>
> Somehow I can’t get guix to actually run my changed code with this
> command, so I’m not sure I tested the right thing.

Did you try the ‘guix environment -n’ command above?  Doesn’t it show
the manual-database.drv?

Alternately, you can also do something like:

  guix install -p /tmp/foo jupyter python-ipython python-ipykernel -n

HTH,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Thu, 18 Jul 2019 08:59:02 GMT) Full text and rfc822 format available.

Message #23 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arne Babenhauserheide <arne_bab <at> web.de>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Thu, 18 Jul 2019 10:55:19 +0200
Hello,

Arne Babenhauserheide <arne_bab <at> web.de> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>> Arne Babenhauserheide <arne_bab <at> web.de> skribis:
> Offtopic: I love reading Esperanto here!
>>> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>>>>   guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
>
>>> How exactly did you run the derivation?
>
>> If you run the command above, it’ll list
>> /gnu/store/…-manual-database.drv.  So you can just run:
>>
>>   guix build /gnu/store/…-manual-database.drv
>>
>> or:
>>
>>   guix build /gnu/store/…-manual-database.drv --check
>>
>> if it had already been built before.
>
> Somehow I can’t get guix to actually run my changed code with this
> command, so I’m not sure I tested the right thing.

Did you try the ‘guix environment -n’ command above?  Doesn’t it show
the manual-database.drv?

Alternately, you can also do something like:

  guix install -p /tmp/foo jupyter python-ipython python-ipykernel -n

HTH,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Thu, 18 Jul 2019 11:00:02 GMT) Full text and rfc822 format available.

Message #26 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Thu, 18 Jul 2019 12:59:23 +0200
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Did you try the ‘guix environment -n’ command above?  Doesn’t it show
> the manual-database.drv?

It does show the manual database, but then running guix build
/gnu/....drv --check does not run my changed code.

I’m doing

time ./pre-inst-env guix build /gnu/store/jnkxwwxk71n07fs6naa11fxmg3vpnnb3-manual-database.drv --check

But it runs the installed guix, not the local changes to profile.scm.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Thu, 18 Jul 2019 13:47:01 GMT) Full text and rfc822 format available.

Message #29 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arne Babenhauserheide <arne_bab <at> web.de>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Thu, 18 Jul 2019 15:46:18 +0200
Arne Babenhauserheide <arne_bab <at> web.de> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Did you try the ‘guix environment -n’ command above?  Doesn’t it show
>> the manual-database.drv?
>
> It does show the manual database, but then running guix build
> /gnu/....drv --check does not run my changed code.
>
> I’m doing
>
> time ./pre-inst-env guix build /gnu/store/jnkxwwxk71n07fs6naa11fxmg3vpnnb3-manual-database.drv --check
>
> But it runs the installed guix, not the local changes to profile.scm.

Right, sorry for being unclear: you need to run

  ./pre-inst-env guix environment -n …

That will run you modified code and thus create a manual-database.drv
that uses your code; it’s this manual-database.drv that you should pass
to ‘guix build’.

HTH,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Thu, 18 Jul 2019 22:50:02 GMT) Full text and rfc822 format available.

Message #32 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Thu, 18 Jul 2019 22:03:03 +0200
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Arne Babenhauserheide <arne_bab <at> web.de> skribis:
>
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>>
>>> Did you try the ‘guix environment -n’ command above?  Doesn’t it show
>>> the manual-database.drv?
>>
>> It does show the manual database, but then running guix build
>> /gnu/....drv --check does not run my changed code.
>>
>> I’m doing
>>
>> time ./pre-inst-env guix build /gnu/store/jnkxwwxk71n07fs6naa11fxmg3vpnnb3-manual-database.drv --check
>>
>> But it runs the installed guix, not the local changes to profile.scm.
>
> Right, sorry for being unclear: you need to run
>
>   ./pre-inst-env guix environment -n …
>
> That will run you modified code and thus create a manual-database.drv
> that uses your code; it’s this manual-database.drv that you should pass
> to ‘guix build’.

That works now — thank you!

With the change:
185552 entries processed in 108.2 s
Before the change:
185552 entries processed in 220.1 s

The exact commands I’m running:
cd Dokumente/Guix/guix
git checkout master # with the change
./pre-inst-env guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
time guix build /gnu/...-manual-database.drv
git checkout 97bf46e64c11c64a968fdb833983ede6bdafbc00
./pre-inst-env guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
time guix build /gnu/...-manual-database.drv

So I also see roughly factor 2 speedup, which means a limit to 4 threads
should work.

(what I saw is that I only see the …manual-database.drv before I build
it, after it’s built, I no longer see it in the environment output)

How do I get the defined limit of cores and threads?

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Wed, 23 Oct 2019 20:00:02 GMT) Full text and rfc822 format available.

Message #35 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: 36630 <at> debbugs.gnu.org
Cc: Arne Babenhauserheide <arne_bab <at> web.de>
Subject: [PATCH] use two threads to build man-pages and secure output msgs
 with mutex
Date: Wed, 23 Oct 2019 21:59:18 +0200
---
 guix/profiles.scm | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/guix/profiles.scm b/guix/profiles.scm
index a0ae9a4c7e..9d2133df2d 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -1315,11 +1315,18 @@ the entries in MANIFEST."
                          (ice-9 threads)
                          (srfi srfi-1)
                          (srfi srfi-19))
+            ;; ensure conflict free user-output
+            (define status-output-mutex (make-mutex))
+
+            (define (status-message msg)
+              (lock-mutex status-output-mutex)
+              (display msg)
+              (unlock-mutex status-output-mutex))

             (define (compute-entry directory count total)
-                              (format #t "\r[~3d/~3d] building list of \
+                              (status-message (format #f "\r[~3d/~3d] building list of \
 man-db entries..."
-                                      count total)
+                                      count total))
                               (force-output)
                               (let ((man (string-append directory
                                                         "/share/man")))
@@ -1332,7 +1339,9 @@ man-db entries..."
               ;; decompression), so report progress as we traverse INPUTS.
               (let* ((inputs '#$(manifest-inputs manifest))
                      (total  (length inputs)))
-                (apply append (par-map compute-entry
+                ;; experimenting shows that two threads suffice to remove most
+                ;; waiting
+                (apply append (n-par-map 2 compute-entry
                                        inputs
                                        (iota total 1)
                                        (make-list total total)))))
--
2.23.0





Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Wed, 23 Oct 2019 20:03:01 GMT) Full text and rfc822 format available.

Message #38 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 36630 <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Wed, 23 Oct 2019 22:01:46 +0200
[Message part 1 (text/plain, inline)]
Arne Babenhauserheide <arne_bab <at> web.de> writes:
>> Right, sorry for being unclear: you need to run
>>
>>   ./pre-inst-env guix environment -n …
>>
>> That will run you modified code and thus create a manual-database.drv
>> that uses your code; it’s this manual-database.drv that you should pass
>> to ‘guix build’.
>
> That works now — thank you!
…
> With the change:
> 185552 entries processed in 108.2 s
> Before the change:
> 185552 entries processed in 220.1 s
>
> The exact commands I’m running:
> cd Dokumente/Guix/guix
> git checkout master # with the change
> ./pre-inst-env guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
> time guix build /gnu/...-manual-database.drv
> git checkout 97bf46e64c11c64a968fdb833983ede6bdafbc00
> ./pre-inst-env guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
> time guix build /gnu/...-manual-database.drv

I now reduced the thread count to exactly 2 (to avoid running into
resource troubles; I hope that two should be safe) and added a mutex for
status messages to ensure that writes don’t overlap.

The patch should arrive shortly.

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#36630; Package guix-patches. (Sun, 27 Oct 2019 16:29:02 GMT) Full text and rfc822 format available.

Message #41 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Arne Babenhauserheide <arne_bab <at> web.de>
To: 36630 <at> debbugs.gnu.org
Cc: Arne Babenhauserheide <arne_bab <at> web.de>
Subject: [PATCH] use two threads to build man-pages and secure output msgs
 with mutex
Date: Sun, 27 Oct 2019 17:27:57 +0100
---
 guix/profiles.scm | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/guix/profiles.scm b/guix/profiles.scm
index a0ae9a4c7e..9d2133df2d 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -1315,11 +1315,18 @@ the entries in MANIFEST."
                          (ice-9 threads)
                          (srfi srfi-1)
                          (srfi srfi-19))
+            ;; ensure conflict free user-output
+            (define status-output-mutex (make-mutex))
+
+            (define (status-message msg)
+              (lock-mutex status-output-mutex)
+              (display msg)
+              (unlock-mutex status-output-mutex))

             (define (compute-entry directory count total)
-                              (format #t "\r[~3d/~3d] building list of \
+                              (status-message (format #f "\r[~3d/~3d] building list of \
 man-db entries..."
-                                      count total)
+                                      count total))
                               (force-output)
                               (let ((man (string-append directory
                                                         "/share/man")))
@@ -1332,7 +1339,9 @@ man-db entries..."
               ;; decompression), so report progress as we traverse INPUTS.
               (let* ((inputs '#$(manifest-inputs manifest))
                      (total  (length inputs)))
-                (apply append (par-map compute-entry
+                ;; experimenting shows that two threads suffice to remove most
+                ;; waiting
+                (apply append (n-par-map 2 compute-entry
                                        inputs
                                        (iota total 1)
                                        (make-list total total)))))
--
2.23.0





Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Tue, 31 Mar 2020 13:03:02 GMT) Full text and rfc822 format available.

Notification sent to arne_bab <at> web.de:
bug acknowledged by developer. (Tue, 31 Mar 2020 13:03:02 GMT) Full text and rfc822 format available.

Message #46 received at 36630-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arne Babenhauserheide <arne_bab <at> web.de>
Cc: 36630-done <at> debbugs.gnu.org
Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Tue, 31 Mar 2020 15:02:47 +0200
Hi Arne,

Arne Babenhauserheide <arne_bab <at> web.de> skribis:

> I now reduced the thread count to exactly 2 (to avoid running into
> resource troubles; I hope that two should be safe) and added a mutex for
> status messages to ensure that writes don’t overlap.

It’s been 9 months but I finally committed a slightly modified variant
as ef4b5f2fed3ca13a0e15a821ba7e561cd4395aa6.  It turns out that the
mutex was unnecessary as ports are thread-safe.

As noted in the log, I see a 36% speedup on my SSD laptop with 4 cores
(slightly less with 2 cores).  It’s not great, but still an improvement!

Thanks,
Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 29 Apr 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 334 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.