GNU bug report logs - #43516
[PATCH core-updates] packages: Enable multi-threaded xz compression when repacking source.

Previous Next

Package: guix-patches;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Sat, 19 Sep 2020 17:04:02 UTC

Severity: normal

Tags: patch

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43516 in the body.
You can then email your comments to 43516 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#43516; Package guix-patches. (Sat, 19 Sep 2020 17:04:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Sat, 19 Sep 2020 17:04:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: guix-patches <at> gnu.org
Cc: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Subject: [PATCH core-updates] packages: Enable multi-threaded xz compression
 when repacking source.
Date: Sat, 19 Sep 2020 13:03:57 -0400
The xz compression is slow; using multiple threads/cores yields a linear
performance improvement.

* guix/packages.scm (patch-and-repack): Ensure xz is invoked with --threads=N
by setting the XZ_DEFAULTS environment variable.
---
 guix/packages.scm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/guix/packages.scm b/guix/packages.scm
index 6598bd3149..678007a807 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -5,6 +5,7 @@
 ;;; Copyright © 2016 Alex Kost <alezost <at> gmail.com>
 ;;; Copyright © 2017, 2019, 2020 Efraim Flashner <efraim <at> flashner.co.il>
 ;;; Copyright © 2019 Marius Bakke <mbakke <at> fastmail.com>
+;;; Copyright © 2020 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -693,6 +694,11 @@ specifies modules in scope when evaluating SNIPPET."
             (setenv "PATH" (string-append #+xz "/bin" ":"
                                           #+decomp "/bin"))
 
+            ;; Enable multi-threaded compression for xz.
+            (setenv "XZ_DEFAULTS" (string-append "--threads="
+                                                 (number->string
+                                                  (parallel-job-count))))
+
             ;; SOURCE may be either a directory or a tarball.
             (if (file-is-directory? #+source)
                 (let* ((store     (%store-directory))
-- 
2.28.0





Information forwarded to guix-patches <at> gnu.org:
bug#43516; Package guix-patches. (Sat, 19 Sep 2020 19:39:01 GMT) Full text and rfc822 format available.

Message #8 received at 43516 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 43516 <at> debbugs.gnu.org
Subject: Re: [bug#43516] [PATCH core-updates] packages: Enable multi-threaded
 xz compression when repacking source.
Date: Sat, 19 Sep 2020 15:38:05 -0400
On Sat, Sep 19, 2020 at 01:03:57PM -0400, Maxim Cournoyer wrote:
> The xz compression is slow; using multiple threads/cores yields a linear
> performance improvement.
> 
> * guix/packages.scm (patch-and-repack): Ensure xz is invoked with --threads=N
> by setting the XZ_DEFAULTS environment variable.

We tried this previous but reverted it because the archives were not
bit-reproducible:

https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3e95125e9bd0676d4a9add9105217ad3eaef3ff0

It's really a shame... it would be nice to reduce the time used for XZ
compression. But the bandwidth used to move the results is even more
expensive in terms of time and money, since most people should get
substitutes.




Information forwarded to guix-patches <at> gnu.org:
bug#43516; Package guix-patches. (Tue, 22 Sep 2020 02:00:01 GMT) Full text and rfc822 format available.

Message #11 received at 43516 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 43516 <at> debbugs.gnu.org
Cc: Leo Famulari <leo <at> famulari.name>
Date: Mon, 21 Sep 2020 22:00:02 -0400
Hi Leo!

> On Sat, Sep 19, 2020 at 01:03:57PM -0400, Maxim Cournoyer wrote:
> > The xz compression is slow; using multiple threads/cores yields a linear
> > performance improvement.
> >
> > * guix/packages.scm (patch-and-repack): Ensure xz is invoked with --threads=N
> > by setting the XZ_DEFAULTS environment variable.

> We tried this previous but reverted it because the archives were not
> bit-reproducible:

> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3e95125e9bd0676d4a9add9105217ad3eaef3ff0

Thanks for bringing this to my attention!  I've studied what others have done
about it, and found a solution that seems to work well on the OpenEmbedded
mailing list [0].  Debian uses something similar in their dpkg.

The important point is that xz will produce reproducible results as long as it
operates in either the single thread mode OR the multi-thread mode (we can't
go from one mode to another reproducibly).  So the following v2 patch ensures
we always use --threads=2 at a minimum, forcing the xz code path into
multi-thread operation.  The --memlimit=50% argument limits the RAM use of xz
to at most half of the available memory, which allows xz to reduce the number
of threads used to meet this requirement.

I've rebuilt the world or core-updates to test this and got impressive
results, such as when building the linux-libre source with 24 cores instead of
1:

$ time guix build --source linux-libre --check

With this change, on a 24 cores/32 GiB system: 24 cores used, 2.9 GiB max memory used, 36.76 s.
On master (same machine): 1 core used, 95 MiB max memory used, 4 m 10 s.

[0]  https://patchwork.openembedded.org/patch/170475/
[1]  https://sources.debian.org/src/dpkg/1.19.7/lib/dpkg/compress.c/#L566-L574

> It's really a shame... it would be nice to reduce the time used for XZ
> compression.

Seems we can have our cake and eat it, too!

Maxim





Information forwarded to guix-patches <at> gnu.org:
bug#43516; Package guix-patches. (Tue, 22 Sep 2020 02:00:02 GMT) Full text and rfc822 format available.

Message #14 received at 43516 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 43516 <at> debbugs.gnu.org
Cc: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>,
 Leo Famulari <leo <at> famulari.name>
Subject: [PATCH core-updates v2] packages: Enable multi-threaded xz
 compression when repacking source.
Date: Mon, 21 Sep 2020 22:00:03 -0400
The xz compression is slow; using multiple threads/cores yields a linear
performance improvement.

* guix/build/utils.scm (%xz-parallel-args): New procedure.
* guix/packages.scm (patch-and-repack): Specify the required above xz
arguments by setting the XZ_DEFAULTS environment variable.
* guix/scripts/pack.scm (%compressors, bootstrap-xz): Modify the commands
Gexps so they do not need to be quoted.  This allows lazily evaluating the
arguments on the builder's side.  Specify the required xz arguments.
(self-contained-tarball): Do not quote the compressor command value.
(docker-image): Likewise.
* guix/utils.scm (decompressed-port, compressed-port)
(compressed-output-port): Specify the required above xz arguments.
---
 guix/build/utils.scm  | 15 ++++++++++++++-
 guix/packages.scm     |  3 +++
 guix/scripts/pack.scm | 27 ++++++++++++++++++---------
 guix/utils.scm        | 10 ++++++----
 4 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index e884c26a22..ff4241d088 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -112,7 +112,9 @@
 
             make-desktop-entry-file
 
-            locale-category->string))
+            locale-category->string
+
+            %xz-parallel-args))
 
 
 ;;;
@@ -1479,6 +1481,17 @@ returned."
              LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE
              LC_TIME)))
 
+
+;;;
+;;; Others.
+;;;
+
+(define (%xz-parallel-args)
+  "The xz arguments required to enable bit-reproducible, multi-threaded
+compression."
+  (list "--memlimit=50%"
+        (format #f "--threads=~a" (max 2 (parallel-job-count)))))
+
 ;;; Local Variables:
 ;;; eval: (put 'call-with-output-file/atomic 'scheme-indent-function 1)
 ;;; eval: (put 'call-with-ascii-input-file 'scheme-indent-function 1)
diff --git a/guix/packages.scm b/guix/packages.scm
index 6598bd3149..865cb81929 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -5,6 +5,7 @@
 ;;; Copyright © 2016 Alex Kost <alezost <at> gmail.com>
 ;;; Copyright © 2017, 2019, 2020 Efraim Flashner <efraim <at> flashner.co.il>
 ;;; Copyright © 2019 Marius Bakke <mbakke <at> fastmail.com>
+;;; Copyright © 2020 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -693,6 +694,8 @@ specifies modules in scope when evaluating SNIPPET."
             (setenv "PATH" (string-append #+xz "/bin" ":"
                                           #+decomp "/bin"))
 
+            (setenv "XZ_DEFAULTS" (string-join (%xz-parallel-args)))
+
             ;; SOURCE may be either a directory or a tarball.
             (if (file-is-directory? #+source)
                 (let* ((store     (%store-directory))
diff --git a/guix/scripts/pack.scm b/guix/scripts/pack.scm
index 379e6a3ac6..a0112162e3 100644
--- a/guix/scripts/pack.scm
+++ b/guix/scripts/pack.scm
@@ -5,6 +5,7 @@
 ;;; Copyright © 2018 Chris Marusich <cmmarusich <at> gmail.com>
 ;;; Copyright © 2018 Efraim Flashner <efraim <at> flashner.co.il>
 ;;; Copyright © 2020 Tobias Geerinckx-Rice <me <at> tobias.gr>
+;;; Copyright © 2020 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -25,6 +26,7 @@
   #:use-module (guix scripts)
   #:use-module (guix ui)
   #:use-module (guix gexp)
+  #:use-module ((guix build utils) #:select (%xz-parallel-args))
   #:use-module (guix utils)
   #:use-module (guix store)
   #:use-module ((guix status) #:select (with-status-verbosity))
@@ -70,29 +72,34 @@
   compressor?
   (name       compressor-name)      ;string (e.g., "gzip")
   (extension  compressor-extension) ;string (e.g., ".lz")
-  (command    compressor-command))  ;gexp (e.g., #~("/gnu/store/…/gzip" "-9n"))
+  (command    compressor-command))  ;gexp (e.g., #~(list "/gnu/store/…/gzip"
+                                    ;                    "-9n" ))
 
 (define %compressors
   ;; Available compression tools.
   (list (compressor "gzip"  ".gz"
-                    #~(#+(file-append gzip "/bin/gzip") "-9n"))
+                    #~(list #+(file-append gzip "/bin/gzip") "-9n"))
         (compressor "lzip"  ".lz"
-                    #~(#+(file-append lzip "/bin/lzip") "-9"))
+                    #~(list #+(file-append lzip "/bin/lzip") "-9"))
         (compressor "xz"    ".xz"
-                    #~(#+(file-append xz "/bin/xz") "-e"))
+                    #~(append (list #+(file-append xz "/bin/xz")
+                                    "-e")
+                              (%xz-parallel-args)))
         (compressor "bzip2" ".bz2"
-                    #~(#+(file-append bzip2 "/bin/bzip2") "-9"))
+                    #~(list #+(file-append bzip2 "/bin/bzip2") "-9"))
         (compressor "zstd" ".zst"
                     ;; The default level 3 compresses better than gzip in a
                     ;; fraction of the time, while the highest level 19
                     ;; (de)compresses more slowly and worse than xz.
-                    #~(#+(file-append zstd "/bin/zstd") "-3"))
+                    #~(list #+(file-append zstd "/bin/zstd") "-3"))
         (compressor "none" "" #f)))
 
 ;; This one is only for use in this module, so don't put it in %compressors.
 (define bootstrap-xz
   (compressor "bootstrap-xz" ".xz"
-              #~(#+(file-append %bootstrap-coreutils&co "/bin/xz") "-e")))
+              #~(append (list #+(file-append %bootstrap-coreutils&co "/bin/xz")
+                              "-e")
+                        (%xz-parallel-args))))
 
 (define (lookup-compressor name)
   "Return the compressor object called NAME.  Error out if it could not be
@@ -269,7 +276,7 @@ added to the pack."
                            #+@(if (compressor-command compressor)
                                   #~("-I"
                                      (string-join
-                                      '#+(compressor-command compressor)))
+                                      #+(compressor-command compressor)))
                                   #~())
                            "--format=gnu"
 
@@ -541,11 +548,13 @@ the image."
                                ,@(source-module-closure
                                   `((guix docker)
                                     (guix build store-copy)
+                                    (guix build utils) ;for %xz-parallel-args
                                     (guix profiles)
                                     (guix search-paths))
                                   #:select? not-config?))
         #~(begin
             (use-modules (guix docker) (guix build store-copy)
+                         (guix build utils)
                          (guix profiles) (guix search-paths)
                          (srfi srfi-1) (srfi srfi-19)
                          (ice-9 match))
@@ -602,7 +611,7 @@ the image."
                                        #~(list (string-append #$profile "/"
                                                               #$entry-point)))
                                 #:extra-files directives
-                                #:compressor '#+(compressor-command compressor)
+                                #:compressor #+(compressor-command compressor)
                                 #:creation-time (make-time time-utc 0 1))))))
 
   (gexp->derivation (string-append name ".tar"
diff --git a/guix/utils.scm b/guix/utils.scm
index 7cc321205e..ba896623f4 100644
--- a/guix/utils.scm
+++ b/guix/utils.scm
@@ -8,6 +8,7 @@
 ;;; Copyright © 2017 Mathieu Othacehe <m.othacehe <at> gmail.com>
 ;;; Copyright © 2018, 2020 Marius Bakke <marius <at> gnu.org>
 ;;; Copyright © 2020 Efraim Flashner <efraim <at> flashner.co.il>
+;;; Copyright © 2020 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -37,7 +38,7 @@
   #:use-module (guix memoization)
   #:use-module ((guix build utils)
                 #:select (dump-port mkdir-p delete-file-recursively
-                          call-with-temporary-output-file))
+                          call-with-temporary-output-file %xz-parallel-args))
   #:use-module ((guix build syscalls) #:select (mkdtemp! fdatasync))
   #:use-module (guix diagnostics)           ;<location>, &error-location, etc.
   #:use-module (ice-9 format)
@@ -220,7 +221,7 @@ a symbol such as 'xz."
   (match compression
     ((or #f 'none) (values input '()))
     ('bzip2        (filtered-port `(,%bzip2 "-dc") input))
-    ('xz           (filtered-port `(,%xz "-dc") input))
+    ('xz           (filtered-port `(,%xz "-dc" ,@(%xz-parallel-args)) input))
     ('gzip         (filtered-port `(,%gzip "-dc") input))
     ('lzip         (values (lzip-port 'make-lzip-input-port input)
                            '()))
@@ -232,7 +233,7 @@ a symbol such as 'xz."
   (match compression
     ((or #f 'none) (values input '()))
     ('bzip2        (filtered-port `(,%bzip2 "-c") input))
-    ('xz           (filtered-port `(,%xz "-c") input))
+    ('xz           (filtered-port `(,%xz "-c" ,@(%xz-parallel-args)) input))
     ('gzip         (filtered-port `(,%gzip "-c") input))
     ('lzip         (values (lzip-port 'make-lzip-input-port/compressed input)
                            '()))
@@ -291,7 +292,8 @@ program--e.g., '(\"--fast\")."
   (match compression
     ((or #f 'none) (values output '()))
     ('bzip2        (filtered-output-port `(,%bzip2 "-c" ,@options) output))
-    ('xz           (filtered-output-port `(,%xz "-c" ,@options) output))
+    ('xz           (filtered-output-port `(,%xz "-c" ,@(%xz-parallel-args)
+                                                ,@options) output))
     ('gzip         (filtered-output-port `(,%gzip "-c" ,@options) output))
     ('lzip         (values (lzip-port 'make-lzip-output-port output)
                            '()))
-- 
2.28.0





Information forwarded to guix-patches <at> gnu.org:
bug#43516; Package guix-patches. (Tue, 22 Sep 2020 15:20:02 GMT) Full text and rfc822 format available.

Message #17 received at 43516 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 43516 <at> debbugs.gnu.org
Subject: Re: your mail
Date: Tue, 22 Sep 2020 11:19:21 -0400
On Mon, Sep 21, 2020 at 10:00:02PM -0400, Maxim Cournoyer wrote:
> Seems we can have our cake and eat it, too!

Amazing! I don't have time to check it myself but please proceed as you
see fit.




Reply sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
You have taken responsibility. (Fri, 09 Oct 2020 02:18:02 GMT) Full text and rfc822 format available.

Notification sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
bug acknowledged by developer. (Fri, 09 Oct 2020 02:18:02 GMT) Full text and rfc822 format available.

Message #22 received at 43516-done <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Leo Famulari <leo <at> famulari.name>
Cc: 43516-done <at> debbugs.gnu.org
Subject: Re: [bug#43516] your mail
Date: Thu, 08 Oct 2020 22:17:36 -0400
Leo Famulari <leo <at> famulari.name> writes:

> On Mon, Sep 21, 2020 at 10:00:02PM -0400, Maxim Cournoyer wrote:
>> Seems we can have our cake and eat it, too!
>
> Amazing! I don't have time to check it myself but please proceed as you
> see fit.

Pushed with commit 5a0997ef7f to core-updates.

Enjoy!

Maxim




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 06 Nov 2020 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 165 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.